From this methodology, we give the ﬁrst-known optimal compression schemes for triangulations of genus-O(1) surfaces and ﬂoorplans

(1)

LINEAR-TIME COMPRESSION OF BOUNDED-GENUS GRAPHS INTO INFORMATION-THEORETICALLY OPTIMAL NUMBER OF

BITS^∗

HSUEH-I LU^†

Abstract. A compression scheme A for a class G of graphs consists of an encoding algorithm Encode_A that computes a binary string Code_A(G) for any given graph G in G and a decoding algorithm Decode_A that recoversG from CodeA(G). A compression scheme A for G is optimal if both Encode_Aand Decode_Arun in linear time and the number of bits of Code_A(G) for any n-node graph G in G is information-theoretically optimal to within lower-order terms. Trees and plane triangulations were the only known nontrivial graph classes to admit optimal compression schemes.

Based upon Goodrich’s separator decomposition for planar graphs and Djidjev and Venkatesan’s planarizers for bounded-genus graphs, we give an optimal compression scheme for any hereditary (i.e., closed under taking subgraphs) classG under the premise that any n-node graph of G to be encoded comes with a genus-o(_logⁿ2n) embedding. By Mohar’s linear-time algorithm that embeds a bounded-genus graph on a genus-O(1) surface, our result implies that any hereditary class of genus- O(1) graphs admits an optimal compression scheme. For instance, our result yields the first-known optimal compression schemes for planar graphs, plane graphs, graphs embedded on genus-1 surfaces, graphs with genus 2 or less, 3-colorable directed plane graphs, 4-outerplanar graphs, and forests with degree at most 5. For nonhereditary graph classes, we also give a methodology for obtaining optimal compression schemes. From this methodology, we give the first-known optimal compression schemes for triangulations of genus-O(1) surfaces and floorplans.

Key words. trees, planar graphs, graph algorithms, data structures, compression AMS subject classifications. 05C05, 05C10, 05C85, 68P05, 68P30

DOI. 10.1137/120879142

1. Introduction. Compact representations of graphs are fundamentally impor- tant and useful in many applications, including representing the meshes in ﬁnite element analysis, terrain models of GIS, three-dimensional (3D) models of graph- ics [48, 64, 80, 81, 82, 85, 89, 92], and VLSI design [56, 84], designing compact routing tables of computer networks [1, 3, 16, 35, 36, 38, 66, 77, 94, 95], and com- pressing the link structure of the Internet [2, 5, 7, 15, 21, 88]. Let G be a class of graphs. Let num(G, n) denote the number of distinct n-node graphs in G. The information-theoretically optimal number of bits to encode an n-node graph inG is

log num(G, n).¹ For instance, ifG is the class of rooted trees, then num(G, n) ≈ _n²3/2²ⁿ

and log num(G, n) = 2n − O(log n); if G is the class of plane triangulations, then log num(G, n) = log²⁵⁶₂₇n + o(n)≈ 3.2451n+o(n) [97]. A compression scheme A for G consists of an encoding algorithm Encode_Athat computes a binary string Code_A(G) for any given graph G inG and a decoding algorithm DecodeAthat recovers graph G from Code_A(G). A compression scheme A for a graph classG with log num(G, n) = O(n)

∗Received by the editors May 29, 2012; accepted for publication (in revised form) January 9, 2014;

published electronically March 20, 2014. A preliminary version of this paper appeared in Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 2002, pp. 223–224. This research was supported in part by NSC grant 101–2221–E–002–062–MY3.

http://www.siam.org/journals/sicomp/43-2/87914.html

†Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Taiwan, ROC (hil@csie.ntu.edu.tw, http://www.csie.ntu.edu.tw/∼hil/). The author also holds joint appointments from the Graduate Institute of Networking and Multimedia and the Grad- uate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University.

1All logarithms throughout the paper are to the base of two.

477

(2)

is optimal if the following three conditions hold:

Condition C1. The running time of algorithm Encode_A(G) is linear in the size of G.

Condition C2. The running time of algorithm Decode_A(Code_A(G)) is linear in the bit count of Code_A(G).

Condition C3. For all positive constants β with log num(G, n) ≤ βn + o(n), the bit count of Code_A(G) for an n-node graph G in G is no more than βn + o(n).

Note that Condition C3 basically says the bit count of Code_A(G) is information- theoretically optimal to within lower-order terms. Although there has been con- siderable work on compression schemes, trees (see, e.g., [11, 50, 67, 72]) and plane triangulations [79] were the only known nontrivial graph classes to admit optimal com- pression schemes. A graph class is hereditary if it is closed under taking subgraphs.

Below is the main result of the paper.

Theorem 1.1. Any hereditary class G of graphs with log num(G, n) = O(n) admits an optimal compression scheme, as long as each input n-node graph in G to be encoded comes with a genus-o(_logⁿ₂

n) embedding.

By Theorem 1.1 and Mohar’s linear-time genus-O(1) embedding algorithm for genus-O(1) graphs [54, 70] (see Lemma 2.5), any hereditary class of genus-O(1) graphs admits an optimal compression scheme. For instance, our result yields the first-known optimal compression schemes for planar graphs, plane graphs, graphs embedded on genus-1 surfaces, graphs with genus 2 or less, 3-colorable directed plane graphs, 4- outerplanar graphs, and forests with degree at most 5. For nonhereditary graph classes, we also give an extension (see Corollary 5.1) of Theorem 1.1. As summarized in the following theorem, we show two classes of genus-O(1) graphs whose optimal compression schemes are obtainable via this extension, where the class of floorplans is defined in related work below.

Theorem 1.2. The following two classes of graphs admit optimal compression schemes:

(1) triangulations of a genus-g surface for any integral constant g, (2) ﬂoorplans.

Technical overview. The kernel of the proof of Theorem 1.1 is a linear-time disjoint partition G₀, . . . , G_p of an n-node graph G embedded on a genus-o(_logⁿ₂

n) surface.² Let poly(n) denote O(n^O(1)). Based upon Goodrich’s separator decomposition of planar graphs [40] and Djidjev and Venkatesan’s planarizer [26], partition G0, . . . , G_p satisﬁes the following conditions, where n_i is the number of nodes of G_i and d_i is the number of times that the nodes of G_i are duplicated in some G_j with j = i:³ (a) n₀ = o(_{log n}ⁿ ), (b) n_i = poly(log n) holds for each i = 1, 2, . . . , p, (c) _p

i=1d_i = o(_{log n}ⁿ ), and (d) _p

i=0n_i = n + o(_{log n}ⁿ ). By condition (a), G₀ can be encoded in o(n) bits. By conditions (b) and (c), the information required to recover G from G₀, G₁, . . . , G_pcan be encoded into o(n) bits (see Lemma 4.1). By condition (d), we have log num(G, n) ≤ o(n) +_p

i=1log num(G, ni). Therefore, the disjoint partition reduces the problem of encoding an n-node graph inG to the problem of encoding a poly(log n)-node graph inG. Applying such a reduction for one more level, it remains to encode a poly(log log n)-node graph inG into an information-theoretically optimal

2Precisely, the disjoint partitionG⁰, . . . , Gpof the edges of the embedded graphG in the proof of Theorem 1.1 isG[V0], G(V1), . . . , G(Vp), where [V0, . . . , Vp] is both (i) a 1-separation S1 of an arbitrary triangulation Δ ofG and (ii) a reﬁnement of the 0-separation S0= [∅, Node(Δ)] of Δ.

3As a matter of fact, in our construction, all duplicated nodes ofGiwithi ≥ 1 belong to G0.

(3)

(c)

(a) (b)

Fig. 1. Three ﬂoorplans with 14 nodes, 6 internal faces, and 19 edges. Floorplans (a) and (b) are equivalent, and ﬂoorplans (b) and (c) are not equivalent.

number of bits, which can be resolved by the standard technique (see, e.g., [47, 72, 78]) of precomputation tables (see Lemma 2.3).

Related work. The compression scheme of Tur´an [96] encodes an n-node plane graph that may have self-loops into 12n bits.⁴ Keeler and Westbrook [55] improved this bit count to 10.74n. They also gave compression schemes for several families of plane graphs. In particular, they used 4.62n bits for plane triangulation, and 9n bits for connected plane graphs free of self-loops and degree-one nodes. For plane triangulations, He, Kao, and Lu [46] improved the bit count to 4n. For triconnected plane graphs, He, Kao, and Lu [46] also improved the bit count to at most 8.585n bits.

This bit count was later reduced to at most ^{9 log}₂²³n≈ 7.134n by Chuang et al. [20].

For any given n-node graph G embedded on a genus-g surface, Deo and Litow [25]

showed an O(ng)-bit encoding for G. These compression schemes all take linear time for encoding and decoding, but Condition C3 does not hold for them. The compression schemes of He, Kao, and Lu [47] (respectively, Blelloch and Farzan [14]) for planar graphs, plane graphs, and plane triangulations (respectively, separable graphs) satisfy Condition C3, but their encoding algorithms require Ω(n log n) time on n-node graphs.

Floorplanning is a fundamental issue in circuit layout [4, 8, 17, 24, 32, 43, 51, 57, 58, 62, 68, 69, 84, 91, 106, 108]. Motivated by VLSI physical design, various representations of floorplans were proposed [33, 109, 110]. Designing a floorplan to meet a certain criterion is NP-complete in general [44, 87, 100], so heuristic techniques such as simulated annealing [17, 101, 102] are practically useful. The length of the encoding affects the size of the search space. A floorplan, which is also known as rectangular drawing, is a division of a rectangle into rectangular faces using horizontal and vertical line segments. Two floorplans are equivalent if they have the same adja- cency relations and relative positions among the nodes. For instance, Figure 1 shows three floorplans: Floorplans (a) and (b) are equivalent. Floorplans (b) and (c) are not equivalent. Let G be the input n-node floorplan. Under the conventional assumption that each node of G, other than the four corner nodes, has exactly three neighbors (see, e.g., [45, 107]), one can verify that G has 0.5n faces and 1.5n−2 edges. Yamanaka and Nakano [103] showed how to encode G into 2.5n bits. Chuang [19] reduced the bit count to 2.293n. Takahashi, Fujimaki, and Inoue [90] further reduced the bit count to 2n. All these compression schemes for floorplans satisfy Conditions C1 and C2, but not Condition C3. Takahashi, Fujimaki, and Inoue [90] also showed that the number of distinct n-node floorplans is no more than 3.375^n+o(n) ≈ 21.755n+o(n). Therefore, our Theorem 1.2(2) encodes an n-node floorplan into at most 1.755n bits.

For applications that require query support, Jacobson [50] gave a Θ(n)-bit en-

4For brevity, we omit all lower-order terms of bit counts in our discussion of related work.

(4)

coding for a connected and simple planar graph G that supports traversal in Θ(log n) time per node visited. Munro and Raman [71] improved this result and gave schemes to encode binary trees, rooted ordered trees, and planar graphs. For a general n-node m-edge planar graph G, they used 2m + 8n bits while supporting adjacency and de- gree queries in O(1) time. Chuang et al. [20] reduced this bit count to 2m + (5 +_k¹)n for any constant k > 0 with the same query support. The bit count can be fur- ther reduced if only O(1)-time adjacency queries are supported, or if G is simple, triconnected, or triangulated [20]. Chiang, Lin, and Lu [18] reduced the number of bits to 2m + 2n. Yamanaka and Nakano [105] showed a 6n-bit encoding for plane triangulations with query support. The succinct encodings of Blandford, Blelloch, and Kash [13] and Blelloch and Farzan [14] for separable graphs support queries. Ya- manaka and Nakano [104] also gave a compression scheme for ﬂoorplans with query support. For labeled planar graphs, Itai and Rodeh [49] gave an encoding of ³₂n log n bits. For unlabeled general graphs, Naor [74] gave an encoding of¹₂n²bits. For certain graph families, Kannan, Naor, and Rudich [52] gave schemes that encode each node with O(log n) bits and support O(log n)-time testing of adjacency between two nodes.

Galperin and Wigderson [34] and Papadimitriou and Yannakakis [75] investigated complexity issues arising from encoding a graph by a small circuit that computes its adjacency matrix. Related work on various versions of succinct graph representations can be found in [6, 9, 28, 29, 30, 31, 37, 42, 53, 73, 76, 83] and the references therein.

Outline. The rest of the paper is organized as follows. Section 2 gives the pre- liminaries. Section 3 shows our algorithm for computing graph separations. Section 4 gives our optimal compression scheme for hereditary graph classes. Section 5 shows a methodology for obtaining optimal compression schemes for nonhereditary graph classes and applies this methodology on triangulations of genus-O(1) graphs and ﬂoor- plans. Section 6 concludes the paper with a couple of open questions.

2. Preliminaries. Unless clearly stated otherwise, all graphs throughout the paper are simple, i.e., have no multiple edges or self-loops.

2.1. Segmentation prefix. LetX denote the number of bits of binary string X. A binary string X0 is a segmentation preﬁx of binary strings X1, . . . , X_d if (a) it takes O(_d

i=1Xi) time to compute X0 from X1, . . . , X_d and (b) given the concate- nation of X₀, X₁, . . . , X_d, it takes O(_d

i=0X_i) time to recover all X_iwith 1≤ i ≤ d.

Lemma 2.1 (see, e.g., [10, 27]). Any binary strings X₁, . . . , X_d with d = O(1) have a segmentation preﬁx with O(log_d

i=1X_i) bits.

Lemma 2.2. Any binary strings X₁, X₂, . . . , X_d have an O(min{m, d log m})-bit segmentation preﬁx, where m =X1 + · · · + X_d.

Proof. Let X be the concatenation of X₁, . . . , X_d. If m≤ d log m, let X be the m-bit binary string with exactly d copies of 1-bits such that the jth bit of X is 1 if and only if j =X1 + · · · + Xi holds for some i = 1, . . . , d. Otherwise, let X store the O(log m)-bit numbersX1 + · · · + Xi for all i = 1, . . . , d. Let X0 be the segmentation preﬁx of X and X as ensured by Lemma 2.1. The concatenation of X₀ and X is a segmentation preﬁx X₀ of X₁, . . . , X_d with O(min{m, d log m}) bits.

The lemma is proved.

For the rest of the paper, let X₁◦· · ·◦Xdbe the concatenation of X₀, X₁, . . . , X_d, where X0is the segmentation preﬁx of X1, . . . , X_d as ensured by Lemma 2.2.

2.2. Precomputation table. Let |S| denote the cardinality of set S. Let Node(G) consist of the nodes in graph G, and let node(G) = |Node(G)|. For any subset V of Node(G), let G[V ] denote the subgraph of G induced by V , and let G\ V

(5)

1

4 5 7 8

3 6

(a) (b)

6 5

3 1 2

0 8 7 4 0

2

Fig. 2. (a) A 9-node plane graphG. (b) A separator decomposition T of G.

denote the subgraph of G obtained by deleting V and their incident edges. Two dis- joint subsets V and V of Node(G) are adjacent in G if there is an edge (v, v) of G with v ∈ V and v ∈ V. For any subset V of Node(G), let Nbr_G(V ) consist of the nodes in Node(G)\ V that are adjacent to V in G, and let nbrG(V ) =|NbrG(V )|. A connected component of graph G is a maximal subset C of Node(G) such that G[C] is connected.

Lemma 2.3. Let G be a graph class satisfying log num(G, n) = O(n). Given positive integers and n with = poly(log log n), it takes overall o(n) time to compute (i) a labeling Label(H) and alog num(G, node(H))-bit binary string Optcode(H) for each distinct graph H∈ G with at most nodes and (ii) an o(n)-bit string Table(G, ) such that the following statements hold:

(1) Given a graph H ∈ G with node(H) ≤ , it takes O(node(H)) time to obtain Optcode(H) and Label(H) from Table(G, ).

(2) Given Optcode(H) for a graph H ∈ G with node(H) ≤ , it takes O(node(H)) time to obtain H and Label(H) from Table(G, ).

Proof. It is straightforward by O(1)^poly()= o(n).

2.3. Separator decomposition of planar graphs. Sets S1, S2, . . . , S_d form a disjoint partition of set S if S1, . . . , S_d are pairwise disjoint and S = S1∪ · · · ∪ S_d. A subset S of Node(G) is a separator of graph G with respect to S1 and S2

if (1) S, S1, and S2 form a disjoint partition of Node(G), (2) S1 and S2 are not adjacent in G, (3) |S| = O(node(G)^1/2), and (4) max{|S1|, |S2|} ≤ ²₃· node(G). A separator decomposition [12] of G is a rooted binary treeT on a disjoint partition of Node(G) such that the following two statements hold, where “nodes” specify elements of Node(G) and “vertices” specify elements of Node(T). Statement 1: Each leaf vertex of T consists of a single node of G. Statement 2: Each internal vertex S of T is a separator of G[Offspring(S)] with respect to Offspring(S₁) and Offspring(S₂), where S₁and S₂are the child vertices of S inT and Offspring(S) (respectively, Offspring(S1) and Offspring(S₂)) is the union of all the vertices in the subtree of T rooted at S (respectively, S₁ and S₂). See Figure 2 for an illustration.

Lemma 2.4 (Goodrich [40]). It takes O(n) time to compute a separator decom- position for any given n-node planar graph.

2.4. Planarizers for nonplanar graphs. The genus of a graph G is deﬁned to be the smallest integer g such that G can be embedded on an orientable surface with g handles without edge crossings [41]. For example, the genus of a planar graph is zero. By Euler’s formula (see, e.g., [39]), an n-node genus-O(n) graph has O(n) edges. Determining the genus of a general graph is NP-complete [93], but Mohar [70] showed that it takes linear time to determine whether a graph is of genus g

(6)

V1

(a)

G(V²)

V2

G[V⁰]

V3

G(V³)

(b) G(V¹)

V3 V0

V2

V1

V⁰ G

Fig. 3. (a) A 9-node plane graph with a separation [V0, . . . , V3]. (b)G[V0],G(V1),G(V2), and G(V3) form a disjoint partition of the edges ofG.

for any g = O(1). Mohar’s algorithm is simpliﬁed by Kawarabayashi, Mohar, and Reed [54].

Lemma 2.5 (Kawarabayashi, Mohar, and Reed [54] and Mohar [70]). It takes O(n) time to compute a genus-O(1) embedding for any given n-node genus-O(1) graph.

Gilbert, Hutchinson, and Tarjan [39] gave an O(n + g)-time algorithm to compute an O((gn)^0.5)-node separator of an n-node genus-g graph, generalizing Lipton and Tarjan’s classic separator theorem for planar graphs [63]. Our result relies on the following planarization algorithm.

Lemma 2.6 (Djidjev and Venkatesan [26]). Given an n-node graph G embedded on a genus-g surface, it takes O(n + g) time to compute a subset V of Node(G) with

|V | = O((gn)^0.5) such that G\ V is planar.

3. Separation and refinement. We say that [V₀, V₁, . . . , V_p] with p≥ 1 is a separation of graph G if the following properties hold:

Property S1. V₀, V₁, . . . , V_p form a disjoint partition of Node(G).

Property S2. Any two V_i and V_i with 1≤ i = i≤ p are not adjacent in G.

Figure 3(a) shows a separation [V₀, V₁, V₂, V₃] of graph G, and Figure 4(a) shows another separation [U₀, U₁, U₂] of G. For any subset V of Node(G), let G(V ) be the subgraph of G induced by V ∪ NbrG(V ) excluding the edges of G[Nbr_G(V )]. If [V₀, . . . , V_p] is a separation of G, then G[V₀], G(V₁), . . . , G(V_p) form a disjoint partition of the edges of G. See Figures 3(b) and 4(b) for illustrations. Let log⁽⁰⁾n = n. For any positive integer k, let log^(k)n = log (log^(k−1)n). For notational brevity, for any nonnegative integer k, let

_k = max{1, log^(k)n}.

For any nonnegative integer k, separation [V0, . . . , V_p] of an n-node graph G is a k-separation of G if the following three properties hold:

Property S3. |V0| = o(ⁿ_k) and p = o(ⁿ

k) + 1.

Property S4. |Vi| + nbrG(V_i) = poly(_k) holds for each i = 1, . . . , p.

Property S5. _p

i=1nbr_G(V_i) = o(ⁿ

k).

One can easily verify that [∅, Node(G)] is a 0-separation of G.⁵ Let [V0, . . . , V_p] and [U₀, . . . , U_q] be two separations of graph G. We say that [V₀, . . . , V_p] is a reﬁnement of [U₀, . . . , U_q] if the following three properties hold:

5The “+1” in Property S3 is redundant fork ≥ 1. However, we need it so that [∅, Node(G)] is a 0-separation ofG, since 1 = o(ⁿ₀).

(7)

V¹

U⁰ V²

U² V³ U¹

(a) G

(b)

G(U²) U0

G[U⁰] G(U¹)

U² U¹

V0

Fig. 4. (a) Separation [V0, V1, V2, V3] is a reﬁnement of separation [U0, U1, U2]. (b) Subgraphs G[U0],G(U1), andG(U2) ofG.

Property R1. U₀⊆ V0.

Property R2. For each index i = 1, . . . , p, there is an index j with 1≤ j ≤ q and V_i⊆ Uj.

Property R3. For any indices i, i, i with 1≤ i < i< i≤ p, if Vi∪ Vi ⊆ Uj, then V_i ⊆ Uj.

For instance, in Figure 4(a), [V0, V1, V2, V3] is a reﬁnement of [U0, U1, U2]. Below is the main lemma of the section.

Lemma 3.1. Let k be a positive integer. Let G be an n-node connected graph embedded on a genus-o(n/²_k) surface. Given a (k− 1)-separation Sk−1 of G, it takes O(n) time to compute a k-separation Sk of G that is a reﬁnement ofSk−1.

The proof of Lemma 3.1 needs the following lemma, which can be proved by Lemmas 2.4 and 2.6.

Lemma 3.2. Let k be a positive integer. Given an n-node graph G embedded on a genus-o(n/²_k) surface, it takes O(n) time to compute an o(ⁿ

k)-node subset V of Node(G) such that each node of Node(G)\ V has degree at most ²_k in G and each connected component of G\ V has at most ⁴_k nodes.

Proof. We ﬁrst apply Lemma 2.6 to compute in O(n) time an o(ⁿ

k)-node subset V of Node(G) such that G\ V is planar. We then apply Lemma 2.4 to compute in O(n) time a separator decomposition T of G \ V. For each vertex S of T, let Offspring(S) denote the union of all the vertices in the subtree of T rooted at S, and let offspring(S) =|Offspring(S)|. Let r = ²_k. Let V consist of the nodes of G with degree more than r in G. Let V be the union of all the vertices S of T with offspring(S) > r². Let V = V∪ V∪ V. By V∪ V⊆ V and the definition of T, each connected component of G\ V has at most r² nodes. By V⊆ V , each node of Node(G)\V has degree at most r in G. Since G has O(n) edges, |V| = O(ⁿ_r) = o(ⁿ

k).

It remains to show that |V| = o(ⁿ_k). For each index i ≥ 1, let Ii consist of the vertices S ofT with r²· (³₂)ⁱ⁻¹< offspring(S)≤ r²· (³₂)ⁱ. By r²≥ 1 and i ≥ 1, each S ∈ Ii is an internal vertex of T. By definition of T, we know that Offspring(S) and Offspring(S) are disjoint for any two distinct elements S and S ofIi, implying that

S∈Iioffspring(S)≤ n holds. Since offspring(S) > r²· (1.5)ⁱ⁻¹ holds for each S∈ Ii, we have|Ii| < _r2·(1.5)ⁿ ⁱ⁻¹. Since each S ∈ Iiis an internal vertex ofT, S is a separator of G[Offspring(S)]. Therefore,|S| = O(r · (1.5)î/2) holds for each vertex S in Ii. We have |V| =

i≥1

S∈Ii|S| =

i≥1O( ⁿ

r·(1.5)^i/2) = O(ⁿ_r) = o(ⁿ

k). The lemma is proved.

(8)

Algorithm 1

Let p = 0, and let all elements ofC be initially unmarked.

For each j = 1, . . . , q, perform the following repeat-loop.

Repeat the following steps until all elements ofCj are marked:

Let v0 be an arbitrary node of V0 adjacent to some unmarked element ofCj. LetU consist of the unmarked elements of Cj that are adjacent to v0 in G.

Let C_i₁, . . . , C_i₃ be the elements ofU in clockwise order around v0in G.

Mark all i₃− i1+ 1 elements ofU.

Repeat the following four steps until i₁> i₃:

Let i₂be the largest index with i₁≤ i2≤ i3 and|Ci1| + · · · + |Ci2| ≤ ⁴_k. Let p = p + 1.

Let hook_p= v₀and V_p= C_i₁∪ · · · ∪ Ci2. Let i₁= i₂+ 1.

Output V₁, . . . , V_p and hook₁, . . . , hook_p.

5 3 5

3 hook³= hook⁴

hook¹ hook²

2 5

2

1 3 4 6

Fig. 5. An illustration for Algorithm 1.

Proof of Lemma 3.1. Suppose that [U₀, . . . , U_q] is the given (k− 1)-separation S_k−1. Let V₀ be the O(n)-time computable subset of Node(G) ensured by Lemma 3.2.

We have|V0| = o(ⁿ_k). Let V₀= U₀∪ V0. LetC consist of the connected components of G\ V0. By V₀ ⊆ V0, each element of C has at most ⁴_k nodes. By U0 ⊆ V0

and Properties S1 and S2 of Sk−1, each element of C is contained by some Uj with 1≤ j ≤ q. For each j = 1, . . . , q, let Cj consist of the elements C ofC with C ⊆ Uj. We run Algorithm 1 to obtain (a) a disjoint partition V₁, . . . , V_p of G\ V0 and (b) p nodes hook₁, . . . , hook_p of V₀, which may not be distinct. LetSk= [V₀, . . . , V_p]. Since G is connected, each element ofC is adjacent to V0. The first statement of the outer repeat-loop is well defined. Since each element ofC has at most ⁴_k nodes, the first statement of the inner repeat-loop is well defined. See Figure 5 for an illustration:

Suppose that all nodes are in U₁. All nodes are initially unmarked. Let V₀consist of the nine unlabeled nodes, including the three gray nodes. For each i = 1, . . . , 6, let C_i consist of the nodes with label i. That is, C1, . . . , C6are the six connected components of G\ V0. Suppose that ⁴_k = 7 and the ﬁrst two iterations of the outer repeat- loop obtain V1 = C1 and V2 = C2. In the third iteration of the outer repeat-loop, C3, . . . , C6 are the unmarked elements ofC that are adjacent to hook3 in clockwise order around hook3. By|C3| + |C4| + |C5| = 7, the two iterations of the inner repeat- loop obtain V3= C3∪ C4∪ C5 and V4= C6.

By deﬁnition of Algorithm 1, one can verify that Properties R1, R2, and R3 hold forSk−1 andSk (that is,Sk is a reﬁnement ofSk−1) and Properties S1 and S2 hold for Sk. By Property S3 of Sk−1, we have|U0| = o(_k−1ⁿ ) = o(ⁿ

k). By |V0| = o(ⁿ_k),

(9)

hooki

5 3

3 5

3

6 hooki

vi

(c) hooki

3 vi

3

3 55

(b)

5 6 4 3

5

5 6 4 3

(a) 5

Fig. 6. The operation that contracts all nodes ofVi into a node vi, which takes over some neighbors of hook_i.

we have|V0| ≤ |U0| + |V0| = o(ⁿ_k). Let I_small consist of the indices i with 1≤ i ≤ p and |V_i| ≤ ¹₂· ⁴_k. Let I_large consist of the indices i with 1≤ i ≤ p and |V_i| > ¹₂· ⁴_k. We show that p =|Ismall| + |Ilarge| = o(ⁿ_k) as follows. By Property S1 ofS_k, we have

|Ilarge| = o(ⁿ_k). To show that |Ismall| = o(ⁿ_k), we categorize the indices i in Ismall

with 1≤ i < p into the the following types, where j is the index with Vi ⊆ Uj: Type 1: i ∈ Ismall and i + 1∈ Ilarge. The number of such indices i is no more

than|Ilarge| = o(ⁿ_k).

Type 2: i∈ Ismall and i + 1∈ Ismall.

Type 2a: V_i+1 ⊆ U_j+1. The number of such indices i is no more than q = o(ⁿ

k−1) = o(ⁿ

k).

Type 2b: V_i+1 ⊆ Uj and hook_i ∈ V0\ U0. By Properties S1 and S2 of Sk−1, we know that hook_i ∈ Uj. By deﬁnition of Algorithm 1, hook_i =

hook_iholds for all indices iwith i < i≤ p. The number of such indices i is no more than|V0\ U0| ≤ |V0| = o(ⁿ_k).

Type 2c: V_i+1 ⊆ Uj and hook_i ∈ U0. We have hook_i ∈ NbrG(U_j). By deﬁnition of Algorithm 1, hook_i = hooki holds for all indices i> i with V_i ⊆ Uj. By Property S5 of Sk−1, the number of such indices i is no more than_q

j=1nbr_G(U_j) = o(ⁿ

k−1) = o(ⁿ

k).

We have p = o(ⁿ

k). Property S3 holds for Sk. By deﬁnition of Algorithm 1,

|Vi| ≤ ⁴_k holds for each i = 1, . . . , p. By V₀ ⊆ V0, each node of Node(G)\ V0 has degree at most ²_k. Property S4 holds forSk.

To see Property S5 ofSk, we obtain a contracted graph from G by performing the following two steps for each i = 1, . . . , p.⁶ Step 1: Let C_i₁, . . . , C_i₂ be the elements of C with Vi = C_i₁ ∪ Ci1+1∪ · · · ∪ Ci2 in clockwise order around hook_i in G. Split hook_iinto two adjacent nodes hook_iand v_i, and let v_i take over the neighbors of hook_i in clockwise order around hook_i from the ﬁrst neighbor of hook_i in C_i₁ to the ﬁrst neighbor of hook_i in C_i₂. Step 2: Contract all nodes of V_i into node v_i, and delete multiple edges and self-loops. See Figure 6 for an illustration: For each i = 3, . . . , 6, let C_i consist of the nodes with labels i in Figure 6(a). Suppose that i₁= 3, i₂ = 5, and V_i = C₃∪ C4∪ C5. The unlabeled circle nodes belong to V₀. The square nodes are two previously contracted nodes v_i and v_i from V_i and V_i for some indices i and i with 1 ≤ i = i < i. Figure 6(b) shows the result of Step 1. Figure 6(c) shows the result of Step 2. Observe that each node that is adjacent to V_i becomes a

6The contraction procedure is only for proving Property S5 ofSk; it is not needed for computing Sk.

(10)

G(U1)

3

G(V3)

0 2

0

3 2 0 1

5 4 U1

2 0 1 V⁰

3 2 2

0

1 3

8 (a)

7 6

5 G

4

V¹ V² V³

(b) G(V2) U⁰

U2

(c)

U² U¹

V¹ V² V³

G(V1)

2

G(U2) 1

1

1 0

Fig. 7. (a) GraphG with a labeling. (b) Subgraphs G(V1),G(V2), andG(V3) ofG with labelings.

(c) SubgraphsG(U1) andG(U2) ofG with labelings.

neighbor of v_i after applying Steps 1 and 2. Also, each neighbor of hook_i that is not in V_i either remains a neighbor of hook_i or becomes a neighbor of v_i after applying Steps 1 and 2. Therefore, for each i = 1, . . . , p and each node v₀∈ NbrG(V_i), there is either an edge (v₀, v_i) or an edge (v_i, v_i) for some index iwith i> i and hook_i = v₀. Thus,_p

i=1nbr_G(V_i) is no more than the number of edges in the resulting contracted simple graph, which has|V0|+p = o(ⁿ_k) nodes. Observe that Step 1 does not increase the genus of the embedding. Since the subgraph induced by V_i∪ {vi} is connected, Step 2 does not increase the genus of the embedding either. The number of edges in the resulting contracted simple genus-o(n/²_k) graph is o(ⁿ

k). Property S5 holds for S_k. The lemma is proved.

4. Our compression scheme. This section proves Theorem 1.1.

4.1. Recovery string. A labeling of graph G is a one-to-one mapping from Node(G) to {0, 1, . . . , node(G) − 1}. For instance, Figure 7(a) shows a labeling for graph G. Let G be a graph embedded on a surface. We say that a graph Δ embedded on the same surface is a triangulation of G if G is a subgraph of Δ with Node(Δ) = Node(G) such that each face of Δ has three nodes. The following lemma shows an o(n)-bit string with which the larger embedded labeled subgraphs of G can be recovered from smaller embedded labeled subgraphs of G in O(n) time.

Lemma 4.1. Let k be a positive integer. Let G be an n-node graph embedded on a genus-o(ⁿ

k) surface. Let Δ be a triangulation of G. Let Sk = [V₀, . . . , V_p] be a given k-separation of Δ and Sk−1 = [U0, . . . , U_q] be a given (k− 1)-separation of Δ such that Sk is a reﬁnement of Sk−1. For any given labeling L_k,i of G(V_i) for each i = 1, . . . , p, the following statements hold :

(1) It takes overall O(n) time to compute a labeling L_k−1,j of subgraph G(U_j) for each j = 1, . . . , q.

(2) Given the above labelings L_k−1,j of subgraphs G(U_j) with 1≤ j ≤ q, it takes O(n) time to compute an o(n)-bit string Rec_k such that G(U_j) and L_k−1,j for all j = 1, . . . , q can be recovered in overall O(n) time from Rec_k and G(V_i) and L_k,i for all i = 1, . . . , p.

Proof. Since Δ is a subgraph G with Node(Δ) = Node(G), one can easily verify thatSk−1 (respectively,Sk) is also a (k− 1)-separation (respectively, k-separation) of G. For each j = 1, . . . , q, let I_j consist of the indices i with V_i ⊆ Uj. Let W_j consist of the nodes of G(U_j) that are not in any V_i with i∈ Ij. By Properties S1 and S2 of Sk, W_j ⊆ V0. For instance, if G is as shown in Figure 7(a), where v_t with 0≤ t ≤ 8

(11)

denotes the node with label t, we have I₁ = {1}, I2 = {2, 3}, W1 = {v2, v₃}, and W₂={v0, v₁, v₂, v₆}. Let the labeling Lk−1,j for G(U_j) be deﬁned as follows:

• For the nodes of G(Uj) in W_j, let L_k−1,j be an arbitrary one-to-one map- ping from W_j to {0, 1, . . . , |Wj| − 1}. In Figure 7(c), we have Lk−1,1(v₂) = 1, L_k−1,1(v₃) = 0, L_k−1,2(v₀) = 2, L_k−1,2(v₁) = 3, L_k−1,2(v₂) = 0, and L_k−1,2(v6) = 1.

• For the nodes of G(Uj) not in W_j, let L_k−1,j be the one-to-one mapping from

i∈IjV_i to {|Wj|, |Wj| + 1, . . . , node(G(Uj))− 1} obtained by sorting (i, L_k,i(v)) for all indices i∈ Ij and all nodes v∈ Vi such that L_k−1,j(v) <

L_k−1,j(v) holds for a node v of V_i and a node v of V_i if and only if (a) i < i or (b) i = i and L_k,i(v) < L_k,i(v). For instance, if L_k,1, L_k,2, and L_k,3 are as shown in Figure 7(b), then L_k−1,1 and L_k−1,2 can be as shown in Figure 7(c) and L_k−2,1 can be as shown in Figure 7(a).

It takes O(node(G(U_j))) = O(|Uj| + nbrG(U_j)) time to compute L_k−1,j from all L_k,i with i∈ Ij. By Property S5 ofSk−1, it takes overall O(n) time to compute all L_k−1,j with 1≤ j ≤ q from all Lk,i with 1≤ i ≤ p. Statement (1) is proved.

By Property S4 ofSk−1, the label of each node of G(U_j) assigned by L_k−1,j can be represented by O(log poly(_k−1)) = O(_k) bits. By Property S4 ofSk, the label of each node of G(V_i) assigned by L_k,ican be represented by O(log poly(_k)) = O(_k+1) bits. For each index j = 1, . . . , q,

• string Rec_k,j stores the adjacency list of the embedded subgraph of G(V_j) induced by W_j via the labeling L_k−1,j of W_j,

• string Rec_k,j stores the information required to recover L_k−1,j from all L_k,i with i∈ Ij, and

• string Rec_k,j stores the information required to recover the embedding of G(U_j) from the embeddings of all G(V_i) with i ∈ Ij and the embedding of the subgraph of G(U_j) induced by W_j.

By deﬁnition of W_j, we have|Wj| = |V0∩Uj|+nbrG(U_j). It follows from Property S3 ofSk and Property S5 ofSk−1that

q j=1

|Wj| ≤ |V0| +

q j=1

nbr_G(U_j) = o

n

_k

+ o

n

_k−1

= o

n

_k

.

Let W =_q

j=1W_j. Since G[V₀], G(V₁), . . . , G(V_p) form a disjoint partition of the edges of G, the overall number of edges in the subgraphs of G(V_j) induced by W_j for all j = 1, . . . , q is no more than the number of edges in G[W ], which is O(|W | + o(ⁿ_k))≤ O(_q

j=1|Wj|) + o(ⁿ_k) = o(ⁿ

k). Therefore, (1)

q j=1

Rec_k,j = o

n

_k

· O(k) = o(n).

It suﬃces for Rec_k,j to store the list of (i, L_k,i(v), L_k−1,j(v)) for all i ∈ Ij and all v ∈ NbrG(V_i). By Property R3 of Sk−1 and Sk and Property S4 of Sk−1, index i can be represented by an O(_k)-bit oﬀset t such that i is the tth smallest index in I_j. Thus, Rec_k,j =

i∈Ijnbr_G(V_i)· O(k). By Property S5 of Sk, we have

_q

j=1

i∈Ijnbr_G(V_i) =_p

i=1nbr_G(V_i) = o(ⁿ

k). Therefore, (2)

q j=1

Rec_k,j = o

n

_k

· O(k) = o(n).