LINEAR-TIME COMPRESSION OF BOUNDED-GENUS GRAPHS INTO INFORMATION-THEORETICALLY OPTIMAL NUMBER OF
BITS∗
HSUEH-I LU†
Abstract. A compression scheme A for a class G of graphs consists of an encoding algorithm EncodeA that computes a binary string CodeA(G) for any given graph G in G and a decoding algorithm DecodeA that recoversG from CodeA(G). A compression scheme A for G is optimal if both EncodeAand DecodeArun in linear time and the number of bits of CodeA(G) for any n-node graph G in G is information-theoretically optimal to within lower-order terms. Trees and plane triangulations were the only known nontrivial graph classes to admit optimal compression schemes.
Based upon Goodrich’s separator decomposition for planar graphs and Djidjev and Venkatesan’s planarizers for bounded-genus graphs, we give an optimal compression scheme for any hereditary (i.e., closed under taking subgraphs) classG under the premise that any n-node graph of G to be encoded comes with a genus-o(logn2n) embedding. By Mohar’s linear-time algorithm that embeds a bounded-genus graph on a genus-O(1) surface, our result implies that any hereditary class of genus- O(1) graphs admits an optimal compression scheme. For instance, our result yields the first-known optimal compression schemes for planar graphs, plane graphs, graphs embedded on genus-1 surfaces, graphs with genus 2 or less, 3-colorable directed plane graphs, 4-outerplanar graphs, and forests with degree at most 5. For nonhereditary graph classes, we also give a methodology for obtaining optimal compression schemes. From this methodology, we give the first-known optimal compression schemes for triangulations of genus-O(1) surfaces and floorplans.
Key words. trees, planar graphs, graph algorithms, data structures, compression AMS subject classifications. 05C05, 05C10, 05C85, 68P05, 68P30
DOI. 10.1137/120879142
1. Introduction. Compact representations of graphs are fundamentally impor- tant and useful in many applications, including representing the meshes in finite element analysis, terrain models of GIS, three-dimensional (3D) models of graph- ics [48, 64, 80, 81, 82, 85, 89, 92], and VLSI design [56, 84], designing compact routing tables of computer networks [1, 3, 16, 35, 36, 38, 66, 77, 94, 95], and com- pressing the link structure of the Internet [2, 5, 7, 15, 21, 88]. Let G be a class of graphs. Let num(G, n) denote the number of distinct n-node graphs in G. The information-theoretically optimal number of bits to encode an n-node graph inG is
log num(G, n).1 For instance, ifG is the class of rooted trees, then num(G, n) ≈ n23/22n
and log num(G, n) = 2n − O(log n); if G is the class of plane triangulations, then log num(G, n) = log25627n + o(n)≈ 3.2451n+o(n) [97]. A compression scheme A for G consists of an encoding algorithm EncodeAthat computes a binary string CodeA(G) for any given graph G inG and a decoding algorithm DecodeAthat recovers graph G from CodeA(G). A compression scheme A for a graph classG with log num(G, n) = O(n)
∗Received by the editors May 29, 2012; accepted for publication (in revised form) January 9, 2014;
published electronically March 20, 2014. A preliminary version of this paper appeared in Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 2002, pp. 223–224. This research was supported in part by NSC grant 101–2221–E–002–062–MY3.
http://www.siam.org/journals/sicomp/43-2/87914.html
†Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Taiwan, ROC (hil@csie.ntu.edu.tw, http://www.csie.ntu.edu.tw/∼hil/). The author also holds joint appointments from the Graduate Institute of Networking and Multimedia and the Grad- uate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University.
1All logarithms throughout the paper are to the base of two.
477
is optimal if the following three conditions hold:
Condition C1. The running time of algorithm EncodeA(G) is linear in the size of G.
Condition C2. The running time of algorithm DecodeA(CodeA(G)) is linear in the bit count of CodeA(G).
Condition C3. For all positive constants β with log num(G, n) ≤ βn + o(n), the bit count of CodeA(G) for an n-node graph G in G is no more than βn + o(n).
Note that Condition C3 basically says the bit count of CodeA(G) is information- theoretically optimal to within lower-order terms. Although there has been con- siderable work on compression schemes, trees (see, e.g., [11, 50, 67, 72]) and plane triangulations [79] were the only known nontrivial graph classes to admit optimal com- pression schemes. A graph class is hereditary if it is closed under taking subgraphs.
Below is the main result of the paper.
Theorem 1.1. Any hereditary class G of graphs with log num(G, n) = O(n) admits an optimal compression scheme, as long as each input n-node graph in G to be encoded comes with a genus-o(logn2
n) embedding.
By Theorem 1.1 and Mohar’s linear-time genus-O(1) embedding algorithm for genus-O(1) graphs [54, 70] (see Lemma 2.5), any hereditary class of genus-O(1) graphs admits an optimal compression scheme. For instance, our result yields the first-known optimal compression schemes for planar graphs, plane graphs, graphs embedded on genus-1 surfaces, graphs with genus 2 or less, 3-colorable directed plane graphs, 4- outerplanar graphs, and forests with degree at most 5. For nonhereditary graph classes, we also give an extension (see Corollary 5.1) of Theorem 1.1. As summarized in the following theorem, we show two classes of genus-O(1) graphs whose optimal compression schemes are obtainable via this extension, where the class of floorplans is defined in related work below.
Theorem 1.2. The following two classes of graphs admit optimal compression schemes:
(1) triangulations of a genus-g surface for any integral constant g, (2) floorplans.
Technical overview. The kernel of the proof of Theorem 1.1 is a linear-time disjoint partition G0, . . . , Gp of an n-node graph G embedded on a genus-o(logn2
n) surface.2 Let poly(n) denote O(nO(1)). Based upon Goodrich’s separator decomposition of planar graphs [40] and Djidjev and Venkatesan’s planarizer [26], partition G0, . . . , Gp satisfies the following conditions, where ni is the number of nodes of Gi and di is the number of times that the nodes of Gi are duplicated in some Gj with j = i:3 (a) n0 = o(log nn ), (b) ni = poly(log n) holds for each i = 1, 2, . . . , p, (c) p
i=1di = o(log nn ), and (d) p
i=0ni = n + o(log nn ). By condition (a), G0 can be encoded in o(n) bits. By conditions (b) and (c), the information required to recover G from G0, G1, . . . , Gpcan be encoded into o(n) bits (see Lemma 4.1). By condition (d), we have log num(G, n) ≤ o(n) +p
i=1log num(G, ni). Therefore, the disjoint partition reduces the problem of encoding an n-node graph inG to the problem of encoding a poly(log n)-node graph inG. Applying such a reduction for one more level, it remains to encode a poly(log log n)-node graph inG into an information-theoretically optimal
2Precisely, the disjoint partitionG0, . . . , Gpof the edges of the embedded graphG in the proof of Theorem 1.1 isG[V0], G(V1), . . . , G(Vp), where [V0, . . . , Vp] is both (i) a 1-separation S1 of an arbitrary triangulation Δ ofG and (ii) a refinement of the 0-separation S0= [∅, Node(Δ)] of Δ.
3As a matter of fact, in our construction, all duplicated nodes ofGiwithi ≥ 1 belong to G0.
(c)
(a) (b)
Fig. 1. Three floorplans with 14 nodes, 6 internal faces, and 19 edges. Floorplans (a) and (b) are equivalent, and floorplans (b) and (c) are not equivalent.
number of bits, which can be resolved by the standard technique (see, e.g., [47, 72, 78]) of precomputation tables (see Lemma 2.3).
Related work. The compression scheme of Tur´an [96] encodes an n-node plane graph that may have self-loops into 12n bits.4 Keeler and Westbrook [55] improved this bit count to 10.74n. They also gave compression schemes for several families of plane graphs. In particular, they used 4.62n bits for plane triangulation, and 9n bits for connected plane graphs free of self-loops and degree-one nodes. For plane triangulations, He, Kao, and Lu [46] improved the bit count to 4n. For triconnected plane graphs, He, Kao, and Lu [46] also improved the bit count to at most 8.585n bits.
This bit count was later reduced to at most 9 log223n≈ 7.134n by Chuang et al. [20].
For any given n-node graph G embedded on a genus-g surface, Deo and Litow [25]
showed an O(ng)-bit encoding for G. These compression schemes all take linear time for encoding and decoding, but Condition C3 does not hold for them. The compression schemes of He, Kao, and Lu [47] (respectively, Blelloch and Farzan [14]) for planar graphs, plane graphs, and plane triangulations (respectively, separable graphs) satisfy Condition C3, but their encoding algorithms require Ω(n log n) time on n-node graphs.
Floorplanning is a fundamental issue in circuit layout [4, 8, 17, 24, 32, 43, 51, 57, 58, 62, 68, 69, 84, 91, 106, 108]. Motivated by VLSI physical design, various representations of floorplans were proposed [33, 109, 110]. Designing a floorplan to meet a certain criterion is NP-complete in general [44, 87, 100], so heuristic techniques such as simulated annealing [17, 101, 102] are practically useful. The length of the encoding affects the size of the search space. A floorplan, which is also known as rectangular drawing, is a division of a rectangle into rectangular faces using horizontal and vertical line segments. Two floorplans are equivalent if they have the same adja- cency relations and relative positions among the nodes. For instance, Figure 1 shows three floorplans: Floorplans (a) and (b) are equivalent. Floorplans (b) and (c) are not equivalent. Let G be the input n-node floorplan. Under the conventional assumption that each node of G, other than the four corner nodes, has exactly three neighbors (see, e.g., [45, 107]), one can verify that G has 0.5n faces and 1.5n−2 edges. Yamanaka and Nakano [103] showed how to encode G into 2.5n bits. Chuang [19] reduced the bit count to 2.293n. Takahashi, Fujimaki, and Inoue [90] further reduced the bit count to 2n. All these compression schemes for floorplans satisfy Conditions C1 and C2, but not Condition C3. Takahashi, Fujimaki, and Inoue [90] also showed that the number of distinct n-node floorplans is no more than 3.375n+o(n) ≈ 21.755n+o(n). Therefore, our Theorem 1.2(2) encodes an n-node floorplan into at most 1.755n bits.
For applications that require query support, Jacobson [50] gave a Θ(n)-bit en-
4For brevity, we omit all lower-order terms of bit counts in our discussion of related work.
coding for a connected and simple planar graph G that supports traversal in Θ(log n) time per node visited. Munro and Raman [71] improved this result and gave schemes to encode binary trees, rooted ordered trees, and planar graphs. For a general n-node m-edge planar graph G, they used 2m + 8n bits while supporting adjacency and de- gree queries in O(1) time. Chuang et al. [20] reduced this bit count to 2m + (5 +k1)n for any constant k > 0 with the same query support. The bit count can be fur- ther reduced if only O(1)-time adjacency queries are supported, or if G is simple, triconnected, or triangulated [20]. Chiang, Lin, and Lu [18] reduced the number of bits to 2m + 2n. Yamanaka and Nakano [105] showed a 6n-bit encoding for plane triangulations with query support. The succinct encodings of Blandford, Blelloch, and Kash [13] and Blelloch and Farzan [14] for separable graphs support queries. Ya- manaka and Nakano [104] also gave a compression scheme for floorplans with query support. For labeled planar graphs, Itai and Rodeh [49] gave an encoding of 32n log n bits. For unlabeled general graphs, Naor [74] gave an encoding of12n2bits. For certain graph families, Kannan, Naor, and Rudich [52] gave schemes that encode each node with O(log n) bits and support O(log n)-time testing of adjacency between two nodes.
Galperin and Wigderson [34] and Papadimitriou and Yannakakis [75] investigated complexity issues arising from encoding a graph by a small circuit that computes its adjacency matrix. Related work on various versions of succinct graph representations can be found in [6, 9, 28, 29, 30, 31, 37, 42, 53, 73, 76, 83] and the references therein.
Outline. The rest of the paper is organized as follows. Section 2 gives the pre- liminaries. Section 3 shows our algorithm for computing graph separations. Section 4 gives our optimal compression scheme for hereditary graph classes. Section 5 shows a methodology for obtaining optimal compression schemes for nonhereditary graph classes and applies this methodology on triangulations of genus-O(1) graphs and floor- plans. Section 6 concludes the paper with a couple of open questions.
2. Preliminaries. Unless clearly stated otherwise, all graphs throughout the paper are simple, i.e., have no multiple edges or self-loops.
2.1. Segmentation prefix. LetX denote the number of bits of binary string X. A binary string X0 is a segmentation prefix of binary strings X1, . . . , Xd if (a) it takes O(d
i=1Xi) time to compute X0 from X1, . . . , Xd and (b) given the concate- nation of X0, X1, . . . , Xd, it takes O(d
i=0Xi) time to recover all Xiwith 1≤ i ≤ d.
Lemma 2.1 (see, e.g., [10, 27]). Any binary strings X1, . . . , Xd with d = O(1) have a segmentation prefix with O(logd
i=1Xi) bits.
Lemma 2.2. Any binary strings X1, X2, . . . , Xd have an O(min{m, d log m})-bit segmentation prefix, where m =X1 + · · · + Xd.
Proof. Let X be the concatenation of X1, . . . , Xd. If m≤ d log m, let X be the m-bit binary string with exactly d copies of 1-bits such that the jth bit of X is 1 if and only if j =X1 + · · · + Xi holds for some i = 1, . . . , d. Otherwise, let X store the O(log m)-bit numbersX1 + · · · + Xi for all i = 1, . . . , d. Let X0 be the segmentation prefix of X and X as ensured by Lemma 2.1. The concatenation of X0 and X is a segmentation prefix X0 of X1, . . . , Xd with O(min{m, d log m}) bits.
The lemma is proved.
For the rest of the paper, let X1◦· · ·◦Xdbe the concatenation of X0, X1, . . . , Xd, where X0is the segmentation prefix of X1, . . . , Xd as ensured by Lemma 2.2.
2.2. Precomputation table. Let |S| denote the cardinality of set S. Let Node(G) consist of the nodes in graph G, and let node(G) = |Node(G)|. For any subset V of Node(G), let G[V ] denote the subgraph of G induced by V , and let G\ V
1
4 5 7 8
3 6
(a) (b)
6 5
3 1 2
0 8 7 4 0
2
Fig. 2. (a) A 9-node plane graphG. (b) A separator decomposition T of G.
denote the subgraph of G obtained by deleting V and their incident edges. Two dis- joint subsets V and V of Node(G) are adjacent in G if there is an edge (v, v) of G with v ∈ V and v ∈ V. For any subset V of Node(G), let NbrG(V ) consist of the nodes in Node(G)\ V that are adjacent to V in G, and let nbrG(V ) =|NbrG(V )|. A connected component of graph G is a maximal subset C of Node(G) such that G[C] is connected.
Lemma 2.3. Let G be a graph class satisfying log num(G, n) = O(n). Given positive integers and n with = poly(log log n), it takes overall o(n) time to compute (i) a labeling Label(H) and alog num(G, node(H))-bit binary string Optcode(H) for each distinct graph H∈ G with at most nodes and (ii) an o(n)-bit string Table(G, ) such that the following statements hold:
(1) Given a graph H ∈ G with node(H) ≤ , it takes O(node(H)) time to obtain Optcode(H) and Label(H) from Table(G, ).
(2) Given Optcode(H) for a graph H ∈ G with node(H) ≤ , it takes O(node(H)) time to obtain H and Label(H) from Table(G, ).
Proof. It is straightforward by O(1)poly()= o(n).
2.3. Separator decomposition of planar graphs. Sets S1, S2, . . . , Sd form a disjoint partition of set S if S1, . . . , Sd are pairwise disjoint and S = S1∪ · · · ∪ Sd. A subset S of Node(G) is a separator of graph G with respect to S1 and S2
if (1) S, S1, and S2 form a disjoint partition of Node(G), (2) S1 and S2 are not adjacent in G, (3) |S| = O(node(G)1/2), and (4) max{|S1|, |S2|} ≤ 23· node(G). A separator decomposition [12] of G is a rooted binary treeT on a disjoint partition of Node(G) such that the following two statements hold, where “nodes” specify elements of Node(G) and “vertices” specify elements of Node(T). Statement 1: Each leaf vertex of T consists of a single node of G. Statement 2: Each internal vertex S of T is a separator of G[Offspring(S)] with respect to Offspring(S1) and Offspring(S2), where S1and S2are the child vertices of S inT and Offspring(S) (respectively, Offspring(S1) and Offspring(S2)) is the union of all the vertices in the subtree of T rooted at S (respectively, S1 and S2). See Figure 2 for an illustration.
Lemma 2.4 (Goodrich [40]). It takes O(n) time to compute a separator decom- position for any given n-node planar graph.
2.4. Planarizers for nonplanar graphs. The genus of a graph G is defined to be the smallest integer g such that G can be embedded on an orientable surface with g handles without edge crossings [41]. For example, the genus of a planar graph is zero. By Euler’s formula (see, e.g., [39]), an n-node genus-O(n) graph has O(n) edges. Determining the genus of a general graph is NP-complete [93], but Mohar [70] showed that it takes linear time to determine whether a graph is of genus g
V1
(a)
G(V2)
V2
G[V0]
V3
G(V3)
(b) G(V1)
V3 V0
V2
V1
V0 G
Fig. 3. (a) A 9-node plane graph with a separation [V0, . . . , V3]. (b)G[V0],G(V1),G(V2), and G(V3) form a disjoint partition of the edges ofG.
for any g = O(1). Mohar’s algorithm is simplified by Kawarabayashi, Mohar, and Reed [54].
Lemma 2.5 (Kawarabayashi, Mohar, and Reed [54] and Mohar [70]). It takes O(n) time to compute a genus-O(1) embedding for any given n-node genus-O(1) graph.
Gilbert, Hutchinson, and Tarjan [39] gave an O(n + g)-time algorithm to compute an O((gn)0.5)-node separator of an n-node genus-g graph, generalizing Lipton and Tarjan’s classic separator theorem for planar graphs [63]. Our result relies on the following planarization algorithm.
Lemma 2.6 (Djidjev and Venkatesan [26]). Given an n-node graph G embedded on a genus-g surface, it takes O(n + g) time to compute a subset V of Node(G) with
|V | = O((gn)0.5) such that G\ V is planar.
3. Separation and refinement. We say that [V0, V1, . . . , Vp] with p≥ 1 is a separation of graph G if the following properties hold:
Property S1. V0, V1, . . . , Vp form a disjoint partition of Node(G).
Property S2. Any two Vi and Vi with 1≤ i = i≤ p are not adjacent in G.
Figure 3(a) shows a separation [V0, V1, V2, V3] of graph G, and Figure 4(a) shows another separation [U0, U1, U2] of G. For any subset V of Node(G), let G(V ) be the subgraph of G induced by V ∪ NbrG(V ) excluding the edges of G[NbrG(V )]. If [V0, . . . , Vp] is a separation of G, then G[V0], G(V1), . . . , G(Vp) form a disjoint partition of the edges of G. See Figures 3(b) and 4(b) for illustrations. Let log(0)n = n. For any positive integer k, let log(k)n = log (log(k−1)n). For notational brevity, for any nonnegative integer k, let
k = max{1, log(k)n}.
For any nonnegative integer k, separation [V0, . . . , Vp] of an n-node graph G is a k-separation of G if the following three properties hold:
Property S3. |V0| = o(nk) and p = o(n
k) + 1.
Property S4. |Vi| + nbrG(Vi) = poly(k) holds for each i = 1, . . . , p.
Property S5. p
i=1nbrG(Vi) = o(n
k).
One can easily verify that [∅, Node(G)] is a 0-separation of G.5 Let [V0, . . . , Vp] and [U0, . . . , Uq] be two separations of graph G. We say that [V0, . . . , Vp] is a refinement of [U0, . . . , Uq] if the following three properties hold:
5The “+1” in Property S3 is redundant fork ≥ 1. However, we need it so that [∅, Node(G)] is a 0-separation ofG, since 1 = o(n0).
V1
U0 V2
U2 V3 U1
(a) G
(b)
G(U2) U0
G[U0] G(U1)
U2 U1
V0
Fig. 4. (a) Separation [V0, V1, V2, V3] is a refinement of separation [U0, U1, U2]. (b) Subgraphs G[U0],G(U1), andG(U2) ofG.
Property R1. U0⊆ V0.
Property R2. For each index i = 1, . . . , p, there is an index j with 1≤ j ≤ q and Vi⊆ Uj.
Property R3. For any indices i, i, i with 1≤ i < i< i≤ p, if Vi∪ Vi ⊆ Uj, then Vi ⊆ Uj.
For instance, in Figure 4(a), [V0, V1, V2, V3] is a refinement of [U0, U1, U2]. Below is the main lemma of the section.
Lemma 3.1. Let k be a positive integer. Let G be an n-node connected graph embedded on a genus-o(n/2k) surface. Given a (k− 1)-separation Sk−1 of G, it takes O(n) time to compute a k-separation Sk of G that is a refinement ofSk−1.
The proof of Lemma 3.1 needs the following lemma, which can be proved by Lemmas 2.4 and 2.6.
Lemma 3.2. Let k be a positive integer. Given an n-node graph G embedded on a genus-o(n/2k) surface, it takes O(n) time to compute an o(n
k)-node subset V of Node(G) such that each node of Node(G)\ V has degree at most 2k in G and each connected component of G\ V has at most 4k nodes.
Proof. We first apply Lemma 2.6 to compute in O(n) time an o(n
k)-node subset V of Node(G) such that G\ V is planar. We then apply Lemma 2.4 to compute in O(n) time a separator decomposition T of G \ V. For each vertex S of T, let Offspring(S) denote the union of all the vertices in the subtree of T rooted at S, and let offspring(S) =|Offspring(S)|. Let r = 2k. Let V consist of the nodes of G with degree more than r in G. Let V be the union of all the vertices S of T with offspring(S) > r2. Let V = V∪ V∪ V. By V∪ V⊆ V and the definition of T, each connected component of G\ V has at most r2 nodes. By V⊆ V , each node of Node(G)\V has degree at most r in G. Since G has O(n) edges, |V| = O(nr) = o(n
k).
It remains to show that |V| = o(nk). For each index i ≥ 1, let Ii consist of the vertices S ofT with r2· (32)i−1< offspring(S)≤ r2· (32)i. By r2≥ 1 and i ≥ 1, each S ∈ Ii is an internal vertex of T. By definition of T, we know that Offspring(S) and Offspring(S) are disjoint for any two distinct elements S and S ofIi, implying that
S∈Iioffspring(S)≤ n holds. Since offspring(S) > r2· (1.5)i−1 holds for each S∈ Ii, we have|Ii| < r2·(1.5)n i−1. Since each S ∈ Iiis an internal vertex ofT, S is a separator of G[Offspring(S)]. Therefore,|S| = O(r · (1.5)i/2) holds for each vertex S in Ii. We have |V| =
i≥1
S∈Ii|S| =
i≥1O( n
r·(1.5)i/2) = O(nr) = o(n
k). The lemma is proved.
Algorithm 1
Let p = 0, and let all elements ofC be initially unmarked.
For each j = 1, . . . , q, perform the following repeat-loop.
Repeat the following steps until all elements ofCj are marked:
Let v0 be an arbitrary node of V0 adjacent to some unmarked element ofCj. LetU consist of the unmarked elements of Cj that are adjacent to v0 in G.
Let Ci1, . . . , Ci3 be the elements ofU in clockwise order around v0in G.
Mark all i3− i1+ 1 elements ofU.
Repeat the following four steps until i1> i3:
Let i2be the largest index with i1≤ i2≤ i3 and|Ci1| + · · · + |Ci2| ≤ 4k. Let p = p + 1.
Let hookp= v0and Vp= Ci1∪ · · · ∪ Ci2. Let i1= i2+ 1.
Output V1, . . . , Vp and hook1, . . . , hookp.
5 3 5
3 hook3= hook4
hook1 hook2
2 5
2
1 3 4 6
Fig. 5. An illustration for Algorithm 1.
Proof of Lemma 3.1. Suppose that [U0, . . . , Uq] is the given (k− 1)-separation Sk−1. Let V0 be the O(n)-time computable subset of Node(G) ensured by Lemma 3.2.
We have|V0| = o(nk). Let V0= U0∪ V0. LetC consist of the connected components of G\ V0. By V0 ⊆ V0, each element of C has at most 4k nodes. By U0 ⊆ V0
and Properties S1 and S2 of Sk−1, each element of C is contained by some Uj with 1≤ j ≤ q. For each j = 1, . . . , q, let Cj consist of the elements C ofC with C ⊆ Uj. We run Algorithm 1 to obtain (a) a disjoint partition V1, . . . , Vp of G\ V0 and (b) p nodes hook1, . . . , hookp of V0, which may not be distinct. LetSk= [V0, . . . , Vp]. Since G is connected, each element ofC is adjacent to V0. The first statement of the outer repeat-loop is well defined. Since each element ofC has at most 4k nodes, the first statement of the inner repeat-loop is well defined. See Figure 5 for an illustration:
Suppose that all nodes are in U1. All nodes are initially unmarked. Let V0consist of the nine unlabeled nodes, including the three gray nodes. For each i = 1, . . . , 6, let Ci consist of the nodes with label i. That is, C1, . . . , C6are the six connected components of G\ V0. Suppose that 4k = 7 and the first two iterations of the outer repeat- loop obtain V1 = C1 and V2 = C2. In the third iteration of the outer repeat-loop, C3, . . . , C6 are the unmarked elements ofC that are adjacent to hook3 in clockwise order around hook3. By|C3| + |C4| + |C5| = 7, the two iterations of the inner repeat- loop obtain V3= C3∪ C4∪ C5 and V4= C6.
By definition of Algorithm 1, one can verify that Properties R1, R2, and R3 hold forSk−1 andSk (that is,Sk is a refinement ofSk−1) and Properties S1 and S2 hold for Sk. By Property S3 of Sk−1, we have|U0| = o(k−1n ) = o(n
k). By |V0| = o(nk),
hooki
5 3
3 5
3
6 hooki
vi
(c) hooki
3 vi
3
3 55
(b)
5 6 4 3
5
5 6 4 3
(a) 5
Fig. 6. The operation that contracts all nodes ofVi into a node vi, which takes over some neighbors of hooki.
we have|V0| ≤ |U0| + |V0| = o(nk). Let Ismall consist of the indices i with 1≤ i ≤ p and |Vi| ≤ 12· 4k. Let Ilarge consist of the indices i with 1≤ i ≤ p and |Vi| > 12· 4k. We show that p =|Ismall| + |Ilarge| = o(nk) as follows. By Property S1 ofSk, we have
|Ilarge| = o(nk). To show that |Ismall| = o(nk), we categorize the indices i in Ismall
with 1≤ i < p into the the following types, where j is the index with Vi ⊆ Uj: Type 1: i ∈ Ismall and i + 1∈ Ilarge. The number of such indices i is no more
than|Ilarge| = o(nk).
Type 2: i∈ Ismall and i + 1∈ Ismall.
Type 2a: Vi+1 ⊆ Uj+1. The number of such indices i is no more than q = o(n
k−1) = o(n
k).
Type 2b: Vi+1 ⊆ Uj and hooki ∈ V0\ U0. By Properties S1 and S2 of Sk−1, we know that hooki ∈ Uj. By definition of Algorithm 1, hooki =
hookiholds for all indices iwith i < i≤ p. The number of such indices i is no more than|V0\ U0| ≤ |V0| = o(nk).
Type 2c: Vi+1 ⊆ Uj and hooki ∈ U0. We have hooki ∈ NbrG(Uj). By definition of Algorithm 1, hooki = hooki holds for all indices i> i with Vi ⊆ Uj. By Property S5 of Sk−1, the number of such indices i is no more thanq
j=1nbrG(Uj) = o(n
k−1) = o(n
k).
We have p = o(n
k). Property S3 holds for Sk. By definition of Algorithm 1,
|Vi| ≤ 4k holds for each i = 1, . . . , p. By V0 ⊆ V0, each node of Node(G)\ V0 has degree at most 2k. Property S4 holds forSk.
To see Property S5 ofSk, we obtain a contracted graph from G by performing the following two steps for each i = 1, . . . , p.6 Step 1: Let Ci1, . . . , Ci2 be the elements of C with Vi = Ci1 ∪ Ci1+1∪ · · · ∪ Ci2 in clockwise order around hooki in G. Split hookiinto two adjacent nodes hookiand vi, and let vi take over the neighbors of hooki in clockwise order around hooki from the first neighbor of hooki in Ci1 to the first neighbor of hooki in Ci2. Step 2: Contract all nodes of Vi into node vi, and delete multiple edges and self-loops. See Figure 6 for an illustration: For each i = 3, . . . , 6, let Ci consist of the nodes with labels i in Figure 6(a). Suppose that i1= 3, i2 = 5, and Vi = C3∪ C4∪ C5. The unlabeled circle nodes belong to V0. The square nodes are two previously contracted nodes vi and vi from Vi and Vi for some indices i and i with 1 ≤ i = i < i. Figure 6(b) shows the result of Step 1. Figure 6(c) shows the result of Step 2. Observe that each node that is adjacent to Vi becomes a
6The contraction procedure is only for proving Property S5 ofSk; it is not needed for computing Sk.
G(U1)
3
G(V3)
0 2
0
3 2 0 1
5 4 U1
2 0 1 V0
3 2 2
0
1 3
8 (a)
7 6
5 G
4
V1 V2 V3
(b) G(V2) U0
U2
(c)
U2 U1
V1 V2 V3
G(V1)
2
G(U2) 1
1
1 0
Fig. 7. (a) GraphG with a labeling. (b) Subgraphs G(V1),G(V2), andG(V3) ofG with labelings.
(c) SubgraphsG(U1) andG(U2) ofG with labelings.
neighbor of vi after applying Steps 1 and 2. Also, each neighbor of hooki that is not in Vi either remains a neighbor of hooki or becomes a neighbor of vi after applying Steps 1 and 2. Therefore, for each i = 1, . . . , p and each node v0∈ NbrG(Vi), there is either an edge (v0, vi) or an edge (vi, vi) for some index iwith i> i and hooki = v0. Thus,p
i=1nbrG(Vi) is no more than the number of edges in the resulting contracted simple graph, which has|V0|+p = o(nk) nodes. Observe that Step 1 does not increase the genus of the embedding. Since the subgraph induced by Vi∪ {vi} is connected, Step 2 does not increase the genus of the embedding either. The number of edges in the resulting contracted simple genus-o(n/2k) graph is o(n
k). Property S5 holds for Sk. The lemma is proved.
4. Our compression scheme. This section proves Theorem 1.1.
4.1. Recovery string. A labeling of graph G is a one-to-one mapping from Node(G) to {0, 1, . . . , node(G) − 1}. For instance, Figure 7(a) shows a labeling for graph G. Let G be a graph embedded on a surface. We say that a graph Δ embedded on the same surface is a triangulation of G if G is a subgraph of Δ with Node(Δ) = Node(G) such that each face of Δ has three nodes. The following lemma shows an o(n)-bit string with which the larger embedded labeled subgraphs of G can be recovered from smaller embedded labeled subgraphs of G in O(n) time.
Lemma 4.1. Let k be a positive integer. Let G be an n-node graph embedded on a genus-o(n
k) surface. Let Δ be a triangulation of G. Let Sk = [V0, . . . , Vp] be a given k-separation of Δ and Sk−1 = [U0, . . . , Uq] be a given (k− 1)-separation of Δ such that Sk is a refinement of Sk−1. For any given labeling Lk,i of G(Vi) for each i = 1, . . . , p, the following statements hold :
(1) It takes overall O(n) time to compute a labeling Lk−1,j of subgraph G(Uj) for each j = 1, . . . , q.
(2) Given the above labelings Lk−1,j of subgraphs G(Uj) with 1≤ j ≤ q, it takes O(n) time to compute an o(n)-bit string Reck such that G(Uj) and Lk−1,j for all j = 1, . . . , q can be recovered in overall O(n) time from Reck and G(Vi) and Lk,i for all i = 1, . . . , p.
Proof. Since Δ is a subgraph G with Node(Δ) = Node(G), one can easily verify thatSk−1 (respectively,Sk) is also a (k− 1)-separation (respectively, k-separation) of G. For each j = 1, . . . , q, let Ij consist of the indices i with Vi ⊆ Uj. Let Wj consist of the nodes of G(Uj) that are not in any Vi with i∈ Ij. By Properties S1 and S2 of Sk, Wj ⊆ V0. For instance, if G is as shown in Figure 7(a), where vt with 0≤ t ≤ 8
denotes the node with label t, we have I1 = {1}, I2 = {2, 3}, W1 = {v2, v3}, and W2={v0, v1, v2, v6}. Let the labeling Lk−1,j for G(Uj) be defined as follows:
• For the nodes of G(Uj) in Wj, let Lk−1,j be an arbitrary one-to-one map- ping from Wj to {0, 1, . . . , |Wj| − 1}. In Figure 7(c), we have Lk−1,1(v2) = 1, Lk−1,1(v3) = 0, Lk−1,2(v0) = 2, Lk−1,2(v1) = 3, Lk−1,2(v2) = 0, and Lk−1,2(v6) = 1.
• For the nodes of G(Uj) not in Wj, let Lk−1,j be the one-to-one mapping from
i∈IjVi to {|Wj|, |Wj| + 1, . . . , node(G(Uj))− 1} obtained by sorting (i, Lk,i(v)) for all indices i∈ Ij and all nodes v∈ Vi such that Lk−1,j(v) <
Lk−1,j(v) holds for a node v of Vi and a node v of Vi if and only if (a) i < i or (b) i = i and Lk,i(v) < Lk,i(v). For instance, if Lk,1, Lk,2, and Lk,3 are as shown in Figure 7(b), then Lk−1,1 and Lk−1,2 can be as shown in Figure 7(c) and Lk−2,1 can be as shown in Figure 7(a).
It takes O(node(G(Uj))) = O(|Uj| + nbrG(Uj)) time to compute Lk−1,j from all Lk,i with i∈ Ij. By Property S5 ofSk−1, it takes overall O(n) time to compute all Lk−1,j with 1≤ j ≤ q from all Lk,i with 1≤ i ≤ p. Statement (1) is proved.
By Property S4 ofSk−1, the label of each node of G(Uj) assigned by Lk−1,j can be represented by O(log poly(k−1)) = O(k) bits. By Property S4 ofSk, the label of each node of G(Vi) assigned by Lk,ican be represented by O(log poly(k)) = O(k+1) bits. For each index j = 1, . . . , q,
• string Reck,j stores the adjacency list of the embedded subgraph of G(Vj) induced by Wj via the labeling Lk−1,j of Wj,
• string Reck,j stores the information required to recover Lk−1,j from all Lk,i with i∈ Ij, and
• string Reck,j stores the information required to recover the embedding of G(Uj) from the embeddings of all G(Vi) with i ∈ Ij and the embedding of the subgraph of G(Uj) induced by Wj.
By definition of Wj, we have|Wj| = |V0∩Uj|+nbrG(Uj). It follows from Property S3 ofSk and Property S5 ofSk−1that
q j=1
|Wj| ≤ |V0| +
q j=1
nbrG(Uj) = o
n
k
+ o
n
k−1
= o
n
k
.
Let W =q
j=1Wj. Since G[V0], G(V1), . . . , G(Vp) form a disjoint partition of the edges of G, the overall number of edges in the subgraphs of G(Vj) induced by Wj for all j = 1, . . . , q is no more than the number of edges in G[W ], which is O(|W | + o(nk))≤ O(q
j=1|Wj|) + o(nk) = o(n
k). Therefore, (1)
q j=1
Reck,j = o
n
k
· O(k) = o(n).
It suffices for Reck,j to store the list of (i, Lk,i(v), Lk−1,j(v)) for all i ∈ Ij and all v ∈ NbrG(Vi). By Property R3 of Sk−1 and Sk and Property S4 of Sk−1, index i can be represented by an O(k)-bit offset t such that i is the tth smallest index in Ij. Thus, Reck,j =
i∈IjnbrG(Vi)· O(k). By Property S5 of Sk, we have
q
j=1
i∈IjnbrG(Vi) =p
i=1nbrG(Vi) = o(n
k). Therefore, (2)
q j=1
Reck,j = o
n
k
· O(k) = o(n).