有向圖與阿貝爾群的性質測試 Property Testing on Directed Graphs

(1)

國立臺灣大學電機資訊學院資訊工程學系博士論文

Department of Computer Science and Information Engineering College of Electrical Engineering and Computer Science

National Taiwan University Doctoral Dissertation

有向圖與阿貝爾群的性質測試 Property Testing on Directed Graphs

and Abelian Groups

狄彥吾 Yen-Wu Ti

指導教授：呂育道博士

Advisor: Yuh-Dauh Lyuu, Ph.D.

中華民國 98 年 6 月

(2)

致謝

首先誠摯的感謝指導教授呂育道博士，老師悉心的教導使我得以一窺資訊科學領域的深奧，不時的討論並指點我正確的方向，使我在這些年中獲益匪淺。老師對學問的嚴謹更是我輩學習的典範。

七年裡的日子，實驗室裡共同的生活點滴，學術上的討論、言不及義的閒扯...，

感謝眾位學長、同學、學弟妹的共同砥礪，你/妳們的陪伴讓七年的研究生活變得絢麗多彩。

感謝周立平、戴天時學長們不厭其煩的指出我研究中的缺失，且總能在我迷惘時為我解惑，也感謝同學袁勤國、陳俊諺、張文彥、林敏順的幫忙，恭喜我們順利走過這些年。實驗室的張中平、馬德文、張經略、林宏佑、陳絢昌學弟們當然也不能忘記，各位的幫忙我銘感五內。

最後，謹以此論文獻給我已經離世的雙親。

(3)

摘摘摘

摘要要要要

性質測試 (property testing)是一個在資訊學科中被廣泛研究的課題，其應用範圍涵蓋了網路拓樸與程式除錯等多個領域。Bender 與 Ron 發展了一個性質測試的演算法，用來測試某些有向圖是否滿足強連通的性質，本論文將這個演算法稱之為 BR 演算法。BR 演算法只適用於滿足特定條件的某些有向圖，是以在應用上有諸多限制。本論文發展了一個不受限制的演算法，可以測試任何一個有向圖是否具有強連通的性質。

相對於在應用上受到限制的 BR 演算法，本論文所發展用來測試強連通性質的演算法，在應用上雖然沒有限制，但是相較於 BR 演算法之效率較差。因此對於滿足 BR 演算法應用限制的少數有向圖，我們還是傾向用 BR 演算法來測試其是否有強連通的性質；至於其他不滿足限制的有向圖，我們便使用本論文發展的演算法。

本論文接著發展一個演算法可以幫助我們在上述兩者間選擇合適的強連通測試演算法。

對於一個事先設定的有向圖H，Alon 與 Shapira 證明了對於任一個有向圖G，如果我們需要大量移除G的有向邊，才可以完全消除G裡面與H共構(isomorphic) 的子圖(subgraph)，則在有向圖G中的H共構子圖的數目有一個下界。對於一個有向子圖，如果其圖形的任一部分都不與H形成共構，我們便稱之為無H子圖。

本論文利用 Alon 與 Shapira 的研究結果，發展了一個演算法，用以測試任一個有向圖是否存在一個由k個點所組成的無H子圖。如前所述，當我們需要在兩個強連通測試演算法之間做選擇的時候，這個測試無H子圖的演算法就可以幫助我們選擇合適的強連通測試演算法。

本論文的最後一部分是利用本論文之前的強連通測試演算法，去發展一個關於群性質的測試演算法。一個群的生成元可以用來生成整個群，而對於一個阿貝爾群而言，其生成元的數目是一個倍受學界關注的研究題目。本論文利用前述的強連通測試演算法，發展了一個很有效率的演算法，可以測試任一個集合與一個二元運算的組合是否為一個生成元數目小於k的阿貝爾群。

(4)

關鍵詞：性質測試，隨機演算法，有向圖，強連通，阿貝爾群，生成元。

(5)

Abstract

Bender and Ron construct a restricted tester on the strong connectivity of digraphs (we call it the BR tester). We generalize the BR tester to test the strong connectivity of digraphs.

For any digraph H and a digraph G being far from any H-free digraph, Alon and Shapira prove a lower bound of the number of H in G. After solving the problem of testing the strong connectivity of digraphs, we use Alon and Shapira’s result to construct a randomized algorithm for testing digraphs with an H-free k-induced subgraph.

Our strong connectivity tester has no restriction but must query about the input more times than the restricted BR tester. Suppose an input digraph satisfies the restrictions of the BR tester, using the BR tester to test the strong connectivity of this input digraph is more efficient than using our strong connectivity tester. If we want to test the strong connectivity of a digraph, our randomized algorithm for testing digraphs with an H-free k-induced subgraph can help us determine which tester should be used to test the strong connectivity of the digraph: the BR tester or our strong connectivity tester.

A generator set for a finite group is a subset of the group elements such that repeated multiplications of the generators alone can produce all the group elements.

The number of generators of an abelian group is an important issue in many studies.

(6)

In most cases, it is not easy to identify whether a group-like structure is an abelian group with k generators for a constant k. We construct an efficient randomized algorithm that, given a finite set with a binary operation, tests if it is an abelian group with a k-generator set. If k is not too large, the query complexity of our algorithm is polylogarithmic in the size of the groundset. Otherwise the query complexity is at most the square root of the size of the groundset.

(7)

Keywords: Property testing; Strong connectivity; H-free subgraph; Abelian group; Generator.

(8)

Chapter 1 Introduction

This world is full of decision problems, and we need to make decisions every day. In computer science, a decision problem asks if an object has a predetermined property.

Unfortunately, sometimes no fast algorithms exist that give the exact answer. In these cases, an approximate answer within a reasonable complexity is an attractive alternative.

A property-testing algorithm offers such answers. For a fixed property P and any object O, the property-testing algorithm determine whether O has property P , or whether O is far from having property P (i.e., far from any other object having P ).

It is, however, arbitrary on objects falling between the two categories. For example, the object can be a graph and the property can be 3-colorabilty. The task should be performed by querying the object (in as few places as possible). In the example, what we query is the existence of edges between two vertices.

Many recent research results concern the testing of graph properties and group properties. In computer science, the general notion of property testing is first formu-

(11)

lated by Rubinfeld and Sudan [64]. In their formulation, a property testing algorithm for property P is given oracle access to the tested object. Distance between instances is measured in terms of the fraction of arguments in the domain.

Property testing emerges naturally in the context of program checking and probabilistically checkable proofs (PCP). Specifically, in the context of program checking, one may choose to test if the program satisfies certain properties before checking that it computes a specified function. This paradigm has been followed both in the theory of program checking [22, 64], and in practice where often programmers first test their programs by verifying that the programs satisfy properties that are known to be satisfied by the functions they compute. In the context of probabilistically checkable proofs, the property tested is being a codeword with respect to a specific code. This paradigm, explicitly introduced in Babai, Fortnow, Levin and Szegedy’s result, has shifted to testing Hadamard codes, and then to testing the long code [10, 13, 15, 16, 43, 44, 54, 68]. All of these papers have focused on property testing of algebraic properties such as linearity, multi-linearity and being a low-degree polynomial.

The number of generators of a group is an important issue in many studies.

Knowing the number of generators of a group leads to a deeper understanding to the structure of a group. It may help us to discover the features of the groups quickly.

A group in which every element commutes with its endomorphic images is called an E-group. A generator set with size k is called a k-generator set and a group for which the elements commute is called an abelian group. We know that every E-group with

(12)

a 2-generator set is abelian and all E-groups with a 3-generator set are nilpotent of class at most 2 [2]. We also know that we need at least four generators to generate a finite non-abelian E-group [1]. A group with a 2-generator set must isomorphic to a proper factor group [61]. The number of generators of a group has also been intensively studied [24, 25, 26, 48, 55, 58, 59].

This dissertation presents a method to combine the testing algorithms of digraphs and groups to test whether a group-like structure is an abelian group with a k- generator set. The first part of this dissertation is testing whether a digraph is strongly connected. Bender and Ron construct a restricted tester on the strong connectivity of digraphs (we call it the BR tester) [17]. There are some instances that do not satisfy the restrictions of the BR tester. We generalize the BR tester to test the strong connectivity of digraphs.

For any digraph H and a digraph G being far from any H-free digraph, Alon and Shapira prove a lower bound of the number of H in G. After solving the problem of testing the strong connectivity of digraphs, we use Alon and Shapira’s result to construct a randomized algorithm for testing digraphs with an H-free k-induced subgraph.

Our strong connectivity tester has no restriction but must query about the input more times than the restricted BR tester. Suppose an input digraph satisfies the restrictions of the BR tester, using the BR tester to test the strong connectivity of this input digraph is more efficient than using our strong connectivity tester. If we want to test the strong connectivity of a digraph, our randomized algorithm for

(13)

testing digraphs with an H-free k-induced subgraph can help us determine which tester should be used to test the strong connectivity of the digraph: the BR tester or our strong connectivity tester.

It is not easy to identify whether a group-like structure is an abelian group with a k-generator set for any given constant k. In the last part of this dissertation, we combine the testing algorithm for the abelian property of groups and the testing algorithm for the strong connectivity of digraphs to form the testing algorithm for a group-like structure being an abelian group with a k-generator set. Our method is to use the strong connectivity of Cayley graphs to test if a finite group-like structure has a k-generator set. Before that, we should test if the input group-like structure is an abelian group. Friedl, Ivanyos and Santha construct a tester which, given a finite group-like structure, tests if it is an abelian group (we call it the FIS tester) [37].

Combining the FIS tester and our strong connectivity tester, we can construct a testing algorithm for a group-like structure being an abelian group with a k-generator set.

(14)

Chapter 2 Background

2.1 Question of property testing

We are interested in the following question of property testing:

Let Π be a fixed property, and t be an instance. Our goal is to determine (possibly probabilistically) if t has property Π or if it is far from any instance that has property Π, where the distance between instances is measured with respect to the uniform probability distribution on the domain of t. Towards this goal, we are allowed to select some elements from t and query a specific information about t on elements of our choice.

Let T be the class of instances that satisfy property Π. Then, testing property Π corresponds to testing membership in the class T . The two most relevant parameters to property testing are the distance, hereafter denoted ², and the desired confidence, denoted p. We require the tester to accept each instance in T and reject every instance that is more than ² away from any instance in T . We allow the tester to be

(15)

probabilistic, and make incorrect positive and negative assertions with probability at most p. The complexity measures we focus on are the query complexity (the number of queries made). We believe that property testing is a natural notion whose relevance to applications goes beyond program checking, and whose scope goes beyond the realm of testing algebraic properties.

2.2 Property testing on combinatorial objects

Working within the above framework, we venture into the domain of combinatorial objects. In particular, we study testing group properties and graph properties, and demonstrate its relevance to the notions of approximation. We hope to derive extremely efficient algorithms for testing natural properties.

We only consider the uniform probability distribution on these combinatorial objects, as well as algorithms that only obtain random samples.

2.3 Property testing and learning theory

Our formulation of testing mimics the standard frameworks of learning theory. Sup- pose the property Π is a set of functions. In both property testing and learning theory, one is given access to an unknown target function f . However, there are two important differences between them. First, the goal of a learning algorithm is to find a good approximation to the target function f ∈ Π, whereas a testing algorithm should only determine whether the target function is in Π or far away from it. This makes the task of testing seem easier than that of learning. But that is misleading

(16)

because a learning algorithm should perform well only when the target function belongs to Π, whereas a testing algorithm must perform well in such cases as well as on functions far away from Π.

Goldreich, Goldwasser and Ron show that the relation between learning and testing is nontrivial. On one hand, proper learning (i.e., when the hypothesis of the learning algorithm must belong to the same class as the target function) implies testing. On the other hand, there are function classes for which testing is harder than (nonproper) learning (i.e., when the hypothesis is not required to belong to the same class as the target function), provided NP * BPP [41].

(17)

Chapter 3 Testing of Digraph Properties

3.1 Property testing on digraphs

We define property testing for digraphs next. Let Π be a property of digraphs, that is, a family of digraphs closed under isomorphism. A digraph G = (V, E) is ²-close to having property Π if there exists a digraph G⁰ = (V, E⁰) having property Π such that the symmetric difference between E and E⁰ is at most ²¡_{|V |}

2

¢. We say that a digraph G is ²-far from having property Π if it is not ²-close to having property Π.

An ²-tester (or simply a tester) for a digraph property Π is a randomized algorithm that is given a size parameter N, a distance parameter ² and the ability to make queries as to whether a directed edge of the input digraph G with N vertices exists.

The total number of queries is called the query complexity of the tester. Let {g_i} be the set of digraphs with N vertices that satisfy Π. The algorithm needs to distinguish with probability at least 2/3 between the case of G having Π and the case of G differing from any g_i in at least ²¡_N

2

¢ edges [7]. In the latter case, G is

(18)

said to be ²-far from property Π. More specifically, T is an ²-tester for Π if for every G = (V, E) and every ², the following two conditions hold:

(1) if G has property Π, then Pr[ T accepts G ]≥ 2/3;

(2) if G is ²-far from having property Π, then Pr[ T accepts G ]≤ 1/3.

The probability 2/3 can be replaced by any constant smaller than 1 because the algorithm can be repeated if necessary. A graph property is testable if the property has a tester and the total number of queries is o(N²).

3.2 Research work related to graph property test- ing

A testing algorithm for graph property Π can make queries on the incidence re- lations of vertices in an input graph G. Property Π is Ω(N²)-evasive if there is no deterministic testing algorithm with query complexity o(N²) that can correctly decide if the input has Π. Holt and Reingold are the first to consider the complexity of recognizing graph properties from their adjacency matrix representations [47].

They show that the graph properties of connectivity and the existence of cycles are both Ω(N²)-evasive. An important open problem in this area is the Aanderaa- Rosenberg conjecture: Every nontrivial monotone graph property without self-loops is ¡_N

2

¢-evasive [23, 29, 47, 53]. Rivest and Vuillemin resolve a weaker version of the Aandreaa-Rosenberg conjecture [63]. The weaker version says that every nontrivial monotone graph property has decision tree complexity Ω(N²).

(19)

The testing of graph properties is pioneered by Goldreich, Goldwasser and Ron [41]. They showe that all graph properties describable by the existence of a partition of a certain type are testable. For a fixed digraph H with at least one edge, let P_H denote the property of the input digraph being H-free. In other words, the digraph G has P_H if and only if it contains no subgraphs isomorphic to H. Alon and Shapira prove that P_H is testable with a total number of queries bounded only by a function of

², independent of N [7]. This result has been improved later by Alon and Shapira [6].

Alon, Fischer, Krivelevich and Szegedy show that every first-order undirected graph property without a quantifier alternation of type ”∀ ∃” has ²-testers whose query complexity is independent of the size of the input digraph [4]. More recently, Alon, Fischer, Newman and Shapira prove a very general result for undirected graphs, which says that the property defined by having any given Szemer´edi-partition is testable with a constant number of queries [5]. Moreover, a purely combinatorial characterization of the graph properties is testable with a constant number of queries.

The testing of other graph and combinatorial properties has also been intensively studied [30, 33, 40, 42, 51].

3.3 Reduction between group properties and di- graph properties

In this section, we define Cayley graphs and introduce the reduction between group properties and digraph properties.

Definition 1 ([46]) Let G be a group, ◦ be the group multiplication and S be a

(20)

subset of the group’s elements not containing the identity element. The Cayley graph associated with S is defined as the digraph having one vertex associated with each group element G and directed edges (g, h) whenever g ◦ h⁻¹ ∈ S.

The properties of Cayley graphs have been extensively studied in graph theory.

These properties are used to develop algebraic settings for studying certain structural and algorithmic properties of the interconnection networks that underlie parallel architectures, including the hypercube, butterfly, cube-connected cycles, multiple rings and star networks [3, 8, 21, 27, 65]. Cayley graphs have also been used to study the gossiping problem in communication networks [20].

We say that a digraph is strongly connected if there is a directed path from every vertex u in the digraph to every other vertex v.

Theorem 2 ([9]) The Cayley graph associated with a subset of a group’s elements (but not containing the identity element) is strongly connected iff the subset generates the group.

Our method relies on the strong connectivity of Cayley graphs to test if a finite group-like structure s = (Γ, ◦, i, 1) has a k-generator set. A subset of a groundset or a vertex set with size k will be called a k-subset from now on. By Theorem 2, for an input group-like structure s = (Γ, ◦, i, 1), if we can test whether there exists a k-subset of Γ with a corresponding strongly connected Cayley graph, then we can test whether s has a k-generator set. In the next chapter, we develop an algorithm for testing strong connectivity of digraphs.

(21)

Chapter 4 Testing Strong Connectivity on Digraphs

4.1 Strongly connected component

In this chapter, we develop an algorithm for testing strong connectivity of digraphs and we will rely on the strong connectivity of Cayley graphs to test if a finite group- like structure s = (Γ, ◦, i, 1) has a k-generator set in the following chapter. For a digraph G = (V, E) with indegree and outdegree bounded by d < |V |, Bender and Ron develop a tester on the strong connectivity of digraphs (we call it the BR tester) [17].

Theorem 3 ([17]) (1) If G is ²-far from being strongly connected with indegree and outdegree bounded by d, then the BR tester will reject it with probability at least 2/3.

(2) The BR tester has one-sided error. (3) The query complexity is O(1/²).

On the other hand, suppose there is no bound on the indegree and outdegree of the digraph G. We construct another tester to test the strong connectivity of

(22)

digraphs in this chapter that is slightly different from the BR tester. To begin with, testing strong connectivity is trivial when the distance parameter is greater than 2/n for the following reason: We can always make a digraph connected by adding at most n − 1 edges. Hence every digraph with n vertices is (2/n)-close to being strongly connected because (2/n)¡_n

2

¢ = n − 1. On the other hand, ² should be greater than 1/¡_n

2

¢ for, otherwise, ²¡_n

2

¢ < 1. To make sure that the problem is not trivial, we assume 1/¡_n

2

¢< ² < 2/n from now on.

A strongly connected component of a digraph G = (V, E) is a maximal subgraph H = (V⁰, E⁰) such that there is a directed path from each vertex p ∈ V⁰ to every other vertex q ∈ V⁰. Denote the set of strongly connected components of G by C = {C₁, C₂, . . . , C_m}. For vertices u ∈ C_i and v ∈ C_j, i 6= j, such that e = (u, v) ∈ E, we call e an outgoing edge of C_i and an incoming edge of C_j. We call a strongly connected component a source if it has only outgoing edges, a sink if it has only incoming edges, an isolation if it has neither outgoing nor incoming edges, and a transferrer if it has both outgoing and incoming edges.

Lemma 4 If a digraph G with n vertices is ²-far from being strongly connected, then the total number of sources, sinks, and isolations in G exceeds ²n².

Proof: Assume the claim of the lemma is wrong and proceed to obtain a con- tradiction. Let set T contain all transferrers and set R consisting of the remaining strongly connected components. We divide the problem up into two cases.

Case 1. T is empty.

A strongly connected component in G is either a source, a sink, or an isola-

(23)

tion, and hence a member of R. Pick a vertex from each strongly connected component of G. As G has |R| strongly connected components, |R| vertices are chosen. Turn these |R| vertices into a directed cycle by adding at most |R|

directed edges to G. Now, for any ordered pair of vertices (u, v) in V , there is a directed path from u to v; hence G is strongly connected. Recall that an

²-far digraph differs from the class of strongly connected digraphs by at least

²n²+ 1 edges. However this is a contradiction, since |R| ≤ ²n². See Fig. 4.1 for illustration.

Case 2. T is not empty.

A chain of strongly connected components (C1, C2, . . . , Cn) consists of strongly connected components such that between any two adjacent strongly connected components C_iand C_i+1, there is a directed edge from a vertex of C_i to a vertex of C_i+1.

For any T ∈ T , there is a longest chain of strongly connected components S in T containing T (i.e., all members of S are transferrers, and T is one of them).

Let S^h be the head of the chain S. S^h is a strongly connected component with both incoming and outgoing edges because S^h ∈ T . There is no directed edge from other strongly connected components in T to S^h for, otherwise, S would not be the longest chain containing T . Therefore, all the incoming edges of S^h are outgoing edges of some sources. By the same token, all the outgoing edges of the tail S^t of S are incoming edges of some sinks. Consequently, for any T ∈ T , we can always find a vertex p from a source and a vertex q

(24)

Strongly connected component

Choose a vertex from the strongly connected component

Choose a vertex from the strongly connected component Choose a vertex

from the strongly

connected component Turn these vertices into a directed cycle

Figure 4.1: Testing strong connectivity of digraphs.

from a sink such that there is a directed path from p to q passing through a vertex of T . As a result, if all the sources and sinks are turned into one single strongly connected component, the transferrers will become part of the same component, too.

Therefore, if we can turn all members of R into one single strongly connected component R⁰, then for all Ti, Tj ∈ T and all vertices x ∈ Ti and y ∈ Tj, there exists a directed path from x through some vertices in R⁰ to y. So we can ignore the transferrers and concentrate on how to make all members of R become one single strongly connected subgraph. We have thus reduced this case to case 1, where T is empty. The number of directed edges we need to add to G is thus at most |R|, too. If |R| is no greater than ²n², we get a contradiction. Q.E.D.

Lemma 5 A digraph ²-far from being strongly connected must have at least (²n²)/2 connected components each containing fewer than d_n¹(²_² − 1)e vertices.

(25)

Proof: Assume the claim is wrong. Lemma 4 says that this digraph has at least

²n² + 1 strongly connected components. Since every strongly connected component has at least one vertex,

n ≥ ¡

²n²+ 1 − (²n²)/2¢

·

»1 n

µ2

² − 1

¶¼ + ²n²

2 · 1

> ²n² 2

1 n

µ2

² − 1

¶ +²n²

2

= ²n 2

µ2

² − 1

¶ +²n²

2

= n −²n 2 +²n²

2

a contradiction. Q.E.D.

4.2 Tester construction

For a digraph with no bound on the indegree and outdegree of each vertex, the algorithm that tests whether it is strongly connected appears in Fig. 4.2 (we call it the CONN tester). It is obviously that testing strong connectivity is trivial when the distance parameter is greater than 2/n. To make sure that the problem is not trivial, we assume 1/¡_n

2

¢< ² < 2/n. The following theorem analyzes the algorithm.

Theorem 6 (1) Our algorithm has one-sided error. (2) Suppose G is ²-far from being strongly connected, then our algorithm will reject it with probability at least 2/3. (3) The query complexity is O³_log

1− ²n2 1/3

²

´ .

(26)

1: Let S = ∅, m = dlog_1−(²n)/21/3e, x = d¹_n(²_² − 1)e

2: while | S |< m do

3: Pick an arbitrary vertex u from V and add it to S

4: Perform BFS on G starting from u and always use a visited vertex’s incoming edges and stop when we reach x vertices

5: if we run out of new vertices then

6: REJECT

7: else

8: Perform BFS starting from u and always use a visited vertex’s outgoing edges and stop when we reach x vertices

9: if we run out of new vertices then

10: REJECT

11: end if

12: end if

13: end while

14: ACCEPT

Figure 4.2: An ²-tester for the strong connectivity of digraphs in the case that there is no prior bound on the indegree and outdegree of each vertex.

(27)

Proof: Since 1/¡_n

2

¢< ² < 2/n, x ≤ n. For a strongly connected digraph G, given an arbitrary vertex u of V , a BFS always reaches at least x vertices by starting from u. In other words, our algorithm never rejects G, a yes instance; hence it has only one-sided error.

For a no instance G, according to Lemma 5, there are at least (²n²)/2 strongly connected components, each containing fewer than x vertices. The probability that an arbitrarily chosen vertex belongs to one of these (²n²)/2 strongly connected com- ponents is at least ^(²n_n²^)/2 = (²n)/2. Our algorithm outputs ACCEPT for a no instance only when the while-loop executes m times. The probability is at most [1 − (²n)/2]^m = [1 − (²n)/2]^dlog^1−(²n)/2¹³^e≤ 1/3. Hence, G is rejected with probability at least 2/3.

Finding all of a vertex’s incoming (outgoing) neighbors takes at most n queries on the incoming (outgoing, respectively) edges given the adjacency matrix. Since the while-loop is repeated at most m times, the query complexity is bounded by:

m · 2 ·

»1 n

µ2

² − 1

¶¼

· n <

·

log_1−(²n)/21 3+ 1

¸

·

·2 n

µ2

² − 1

¶¸

· n

= O

Ãlog_1−(²n)/2¹₃

²

!

. Q.E.D.

According to Theorem 6, the query complexity of the CONN tester is much higher than the BR tester. Suppose the indegree and the outdegree of an input digraph are both bounded by a given constant, we can use the BR tester to test the strong connectivity of the input in order to reduce the query complexity; otherwise, we use the CONN tester. In the next chapter, we investigate the method to test whether a

(28)

digraph contains H-free k-induced subgraphs. If we do not know the upper bounds of the indegree and the outdegree of an input digraph, our result in the next chapter can help us determine what algorithm to use and then use that algorithm to test the strong connectivity of the input.

(29)

Chapter 5 Testing Whether a Digraph Contains H-free k-induced Subgraphs

5.1 Existence of H-free k-induced subgraphs is Ω(N

²

)- evasive

In this section, we show that the query complexity of any deterministic algorithm for the existence of H-free k-induced subgraphs is Ω(N²).

First, we need some results concerning Tur´an numbers. For any integer N and a fixed graph H, let ex(N, H) denote the maximum number of edges that an N- vertex graph may have if it contains no isomorphic copy of H. This is the Tur´an number of H. Furthermore, we will denote by b_r,s the complete undirected bipartite graph between a set of r vertices and another set of s vertices. The following fact is well-known.

(30)

Fact 7 ([54]) For r ≤ s, ex(N, br,s) = O(N^2−(1/r)).

If we replace the undirected edges of b_r,s by directed edges with an arbitrary direction, a complete bipartite digraph d_r,s results. The next theorem shows that it is Ω(N²)-evasive to determine if there is a dr,s-free k-induced subgraph in a digraph.

In our model, whenever an algorithm queries a pair of vertices x, y in the input graph, it actually means that the algorithm queries the existence of edges (x, y) and (y, x) simultaneously. For a set S, we say that a subset T ⊆ S is a k-subset of S if |T | = k.

Suppose a digraph G contains a subgraph isomorphic to a digraph H. Then we say G contains a copy of H.

Theorem 8 For any constant ρ < 1, k < N/2 and any complete bipartite digraph dr,s, no algorithm can determine whether a digraph contains a dr,s-free k-induced subgraph with query complexity ≤ ρ¡_k

2

¢ if k = λN with λ being a constant.

Proof: Suppose there exists an algorithm A that determines if a digraph contains a d_r,s-free k-induced subgraph with ρ¡_k

2

¢ queries. For the rest of the proof, assume k = O(N) and k is large enough so that

(1 − ρ) µk

2

¶

≥ ex(k, b_r,s) = O(k^2−(1/r)). (1)

Start with a digraph G₁ with N vertices that contains no copies of d_r,s (this is easy to construct). Let G₁ be the input of A. Obviously, all k-induced subgraphs of G₁ are d_r,s-free. Let G₂ = (V₂, E₂) be a graph with N isolated vertices. Every time A queries a pair of vertices x, y in G₁, we add that edge to G₂ if there is an edge

(31)

between them. When A stops, the resulting G2 has no k-induced subgraphs which contain d_r,s, just like G₁. For those vertex pairs of G₁ that are not queried by A, we add an edge (but without the directions) to G₂. For each k-induced subgraph of G₂, at least (1 − ρ)¡_k

2

¢ undirected edges are added. According to Fact 7, every k-induced subgraph of G₂ must contain a copy of b_r,s with the undirected edges alone because of Eq. (1).

Now, we select a k-induced subgraph K1 in G2 and replace one copy of br,s in K1

by d_r,s. Let V_b,1 be the vertex set of this copy of d_r,s, and define h = |V_b,1| = r + s.

For each subset of V₂ with size k that contains V_b,1, its induced subgraph has a copy of d_r,s too. There are ¡_{N −h}

k−h

¢ such k-subsets of V that contain V_b,1. Let k/N = λ.

Recall that λ is a constant. Now, the ratio of the number of all such k-subsets to the number of k-induced subgraphs of G₂ is¡_{N −h}

k−h

¢/¡_N

k

¢. Note that

¡_{N −h}

k−h

¢

¡_N

k

¢ = k(k − 1) · · · (k − h + 1) N(N − 1) · · · (N − h + 1).

As h is a constant and k < N/2, it is not hard to prove that there is a number m > 0 such that for every N > m it holds that

k

N > k − 1

N − 1 > · · · > k − h + 2

N − h + 2 > k − h + 1 N − h + 1

= (k/N) − (h/N) + 1/N

1 − (h − 1)/N > λ 1 + λ. Thus if N is large enough,

¡_{N −h}

k−h

¢

¡_N

k

¢ = k(k − 1) · · · (k − h + 1) N(N − 1) · · · (N − h + 1) >

µ λ

1 + λ

¶_h .

(32)

We conclude that at least ¡ _λ

1+λ

¢_h¡_N

k

¢ k-induced subgraphs contain a copy of dr,s. Next we select another k-induced subgraph K₂ = (V⁰, E⁰) with V⁰ ∩ V_b,1 = ∅. It is worth noting that K₂ also has a copy of b_r,s, and the vertex set of b_r,s is V_b,2. Like what we did before, we replace this copy of br,s in K2 by dr,s. There are ¡_{N −2h}

k−h

¢ such k-subsets of V that contain V₂. The ratio of the number of all such k-subsets to the number of k-induced subgraphs of G₂ is ¡_{N −2h}

k−h

¢/¡_N

k

¢. Again, for N large enough,

N →∞lim

¡_{N −2h}

k−h

¢

¡_N

k

¢ = lim

N →∞

¡_{N −h}

k−h

¢

¡_N

k

¢ >

µ λ

1 + λ

¶_h .

We claim that in general, for every constant i,

¡_{N −ih}

k−h

¢

¡_N

k

¢ = k(k − 1) · · · (k − h + 1)

N(N − 1) · · · (N − h + 1) · (N − k) · · · (N − k − (i − 1)h + 1) (N − h) · · · (N − ih + 1)

>

µ λ

1 + λ

¶_h

. (2)

To verify this, recall that as we showed before, k(k − 1) · · · (k − h + 1) N(N − 1) · · · (N − h + 1) >

µ λ

1 + λ

¶_h .

As for

(N − k) · · · (N − k − (i − 1)h + 1) (N − h) · · · (N − ih + 1) , since

N − k

N − h > N − k − 1

N − h − 1 > · · · > N − k − (i − 1)h + 1 N − ih + 1

(33)

we have

(N − k)(N − k − 1) · · · (N − k − (i − 1)h + 1) (N − h)(N − h − 1) · · · (N − ih + 1) >

µN − k − (i − 1)h + 1 N − ih + 1

¶_(i−1)h

Now, with k = λN, it is easy to see that

N − k − (i − 1)h + 1

N − ih + 1 > N − k − (i − 1)h + 1

N > (1 − 2λ)

where the last inequality is due to k > (i − 1)h − 1. Hence, when we repeat the above process i times, at least

[(1 − 2λ) + (1 − 2λ)²+ · · · + (1 − 2λ)^(i−1)h]

µ λ

1 + λ

¶_hµ N

k

¶

(3)

k-induced subgraphs contain a copy of d_r,s. Recall that k < N/2. Hence 2λ < 1 and formula (3) is less than _2λ¹ ¡ _λ

1+λ

¢_h¡_N

k

¢.

Since λ, h and (λ/(1+λ))^−hare constants, we can repeat this process 2λ(λ/(1 + λ))^−h times such that V_b,i∩ V_b,j = ∅ for i 6= j and N large enough. After having repeated this process that many times, we select 2λ(λ/(1 + λ))^−hh < N distinct vertices from V for N large enough, and, by Eq. (2), the ratio of the number of K2λ(λ/(1+λ))^−h

to the number of all k-induced subgraphs of G₂ will be at least 2λ(λ/(1 + λ))^−h. The number of k-induced subgraphs that contain a copy of d_r,s then is at least 2λ(λ/(1 + λ))−h (λ/(1+λ))^h

2λ

¡_N

k

¢ = ¡_N

k

¢. In other words, after we repeat this process 2λ(λ/(1 + λ))^−h times and remove the remaining undirected edges, all k-induced

(34)

subgraphs of G2 will have a copy of dr,s. This digraph G2 contains, therefore, no H-free k-induced subgraph. However, A cannot distinguish between G₁ and G₂ be- cause we have only changed G₂’s unqueried edges. So, A will accept G₂, which is a contradiction. Q.E.D.

5.2 Tester construction

Fix a digraph H with h vertices and m ≥ 1 edges. Recall that P_k,H, where k ≥ h, denotes the property that G contains an H-free k-induced subgraph. We will show that property P_k,H is testable with a query complexity independent of the input size.

A set with size n will be called an n-set, and a multiset with size n will be called an n-multiset. There is a function f (²; H) with the following properties, which will be critical to our analysis later.

Theorem 9 ([7]) Let H be a fixed digraph with h vertices and D be a digraph with N vertices. If at least ²N² edges have to be removed from D to make it H-free, then D contains at least f (²; H)N^h copies of H.

The following corollary is immediate.

Corollary 10 Let H be a fixed digraph with h vertices and m edges, D be a digraph with N vertices and σ =¡(^h2)

m

¢. If at least ²N² edges have to be removed from D to make it H-free, then D contains at least f (²; H)N^h/σ h-sets whose induced subgraphs contain copies of H.

(35)

Suppose the input N-vertex digraph G = (V, E) is ²-far from having property P_k,H. Corollary 10 tells us that G must contain at least f (²; H)N^h/¡(^h2)

m

¢h-sets whose induced subgraphs contain copies of H. So to test property P_k,H on G, our idea is to randomly select many h-sets from V . Suppose G contains an H-free k-induced subgraph, say (V_k, E_k). Then with enough h-sets from V , at least one of them is expected to be a subset of V_k with high probability. To verify if this is the case, we will check if an h-set S satisfies S ⊆ Vk in 2 steps. First, we check the induced subgraph of S. When S ⊆ V_k, the induced subgraph of S contains no copies of H.

If the induced subgraph of S contains no copies of H, we randomly add a number of other vertices to S (the number will be determined later) and check if there is a subset of S (with a size to be determined later) whose induced subgraph contains no copies of H. If S ⊆ V_k, we expect that S will pass both tests with high probability.

Thus, G will be accepted by our algorithm with high probability. On the other hand, suppose G is ²-far from any digraph which has property P_k,H. Then we expect to find a copy of H in all the induced subgraphs of the above-mentioned h-sets S with high probability. Our algorithm is detailed in Fig. 5.1.

We shall need the Chernoff bound in later analysis.

Theorem 11 (Chernoff bound) Let X = X₁ + X₂ + · · · + X_n be a sum of n independent random variables such that 0 < P r[ X_i = 1 ] < 1 holds for each i = 1, 2, ..., n and µ = E[X]. Then for any 0 < ∆ < 1,

Pr[ X < (1 − ∆)µ ] < e^−µ∆²^/2

(36)

1: if k <√

²N then

2: ACCEPT

3: end if

4: let λ = k/N, κ = log

1−⁽

√²)h 2

(1/6), σ =¡(^h2)

m

¢ and θ = max{log^{6f (²;H)h!}

σλ2

(2/3)^1/κ, 1}

5: for i = 1 to κ do

6: randomly select an h-set S from V

7: if the induced subgraph of S does not contain an H then

8: randomly select additional vertices p = 6θh/λ times (with replacements) from V − S (assume these p vertices to be x₁, x₂, . . . , x_p) {note there are

¡_p

θh

¢ (θh)-multisets in {x₁, x₂, . . . , x_p}}

9: for j = 1 to ¡_p

θh

¢ do

10: let S_j be the jth (θh)-multiset selected in step 8

11: if the induced subgraph of S_j ∪ S contains no copies of H then

12: ACCEPT

13: end if

14: end for

15: end if

16: end for

17: REJECT

Figure 5.1: The ²-tester for property P_k,H.

(37)

where e is the base of the natural logarithm.

Note that in property Pk,H, h is a constant. Hence f (²; H) is a function in ² only.

We assume that H is a fixed digraph with h vertices and m edges and recall that G is the input digraph with N vertices from now on.

Definition 12 Let 0 < ² < 1, N, k ∈ N, λ = k/N, H be a fixed digraph with h vertices, m be the number of edges in H, σ =¡(^h2)

m

¢, κ = log

1−⁽

√²)h 2

(1/6) = Θ(1/²^h/2), and θ = max{log6f (²;H)h!

σλ2

(2/3)^1/κ, 1} = Θ(f (²; H)) when f (²; H) is only dependent on 1/². If the value of f (²; H) is large enough such that

Ã

f (²;H)h!

((^h²)

m)^λ

!_θ

≥ (λ/6)^θ(2/3)^1/κ, then we say f (²; H) satisfies condition 1.

Fact 13 ([7]) For a connected H, f (²; H) has a polynomial dependency on 1/² if and only if the core of H is either an oriented tree or a directed cycle of length 2.

By Fact 13, f (²; H) has a polynomial dependency on 1/² for many H. Since the value of f (²; H) is independent of h and m and ¡₂

3

¢_1/(θκ)

≤ 1, assuming f (²; H) = O((1/²)^j), we can find a smaller ² = O





"

((^h²)

m)

h! λ² 6

¡₂

3

¢_1/(θκ)#_−j

 such that

f (²; H) ≥

¡(^h2)

m

¢ h!

λ² 6

µ2 3

¶_1/(θκ)

i.e.,

Ãf (²; H)h!

¡(^h2)

m

¢λ

!_θ

≥ (λ/6)^θ(2/3)^1/κ;

hence f (²; H) satisfies condition 1.

(38)

Claim 14 Assume 0 < ² < 1, N, k ∈ N and k ≥ √

²N. Suppose the input digraph G = (V, E) with N vertices contains an H-free k-induced subgraph, say K = (V_k, E_k).

The probability of S ⊆ V_k for a random h-subset S ⊆ V is greater than (√

²)^h/2 for N large enough.

Proof: The probability of S ⊆ V_k for a random h-set S is

¡_k

h

¢

¡_N

h

¢ = k(k − 1) · · · (k − h + 1) N(N − 1) · · · (N − h + 1). Since k ≥√

²N, the above probability is at least

√²N (√

²N − 1) · · · (√

²N − h + 1) N(N − 1) · · · (N − h + 1) > (√

²)^h 2 for N large enough. Q.E.D.

Claim 15 Let 0 < ² < 1, N, k ∈ N, λ = k/N, H be a fixed digraph with h vertices and m be the number of edges in H. Suppose the input graph G = (V, E) with N vertices is ²-far from any digraph having property P_k,H. The probability of finding an h-set whose induced subgraph contains copies of H is at least f (²; H)h!/h¡(^h2)

m

¢λ i

.

Proof: By Corollary 10, each k-induced subgraph of G contains at least f (²; H)N^h/h¡(^h2)

m

¢i h-sets whose induced subgraphs contain copies of H. Therefore, by dividing V into

N/k k-sets, we can find at least h

f (²; H)N^h/¡(^h2)

m

¢i(N/k) = f (²; H)N^h/h¡(^h2)

m

¢λ i

h-sets whose induced subgraphs contain copies of H in G, and the probability of

(39)

finding an h-set whose induced subgraph contains copies of H is at least

f (²;H)N^h

((^h²)

m)^λ

¡_N

h

¢ = f (²; H)N^h· ¹_λ · h!

N(N − 1) · · · (N − h + 1)¡(^h2)

m

¢

> f (²; H)h!

¡(^h2)

m

¢λ . Q.E.D.

The following theorem proves the testability of P_k,H.

Theorem 16 Let 0 < ² < 1, 0 < k < N be an integer and H be a fixed digraph.

If f (²; H) satisfies condition 1, the property P_k,H is testable with a query complexity independent of the input size.

Proof: Suppose k < √

²N . Then the number of edges in a k-induced subgraph is less than ²N². The input graph G, therefore, cannot be ²-far from any digraph which has property Pk,H, and we can simply accept it. Assume k ≥√

²N for the rest of the proof.

Suppose the input digraph G = (V, E) contains an H-free k-induced subgraph, say K = (Vk, Ek). The probability that the algorithm accepts G is at least the probability of selecting a subset of V_k in step 6 of the algorithm in Fig. 5.1 and the tester accepts in step 12 for some j.

By Claim 14, the probability of S * V_k is at most 1 − (√

²)^h/2. As we inde- pendently select κ h-sets S, the probability of S * V_k for all κ of them is at most

£1 − (√

²)^h/2¤_κ

= 1/6. Assume S ⊆ V_k from now on. We randomly select p other vertices (with replacements) in step 8. Denote the jth such (θh)-multiset by S_j. The algorithm then checks if the induced subgraph of S_j ∪ S contains a copy of H.

(40)

Let event B mean Sj ∪ S contains a copy of H for all j. Given S ⊆ Vk, if more than θh vertices are selected from V_k in step 8, then event B will not occur (note that θ ≥ 1). Thus the probability of event B is at most the probability that the algorithm selects fewer than θh vertices from Vk in step 8. Let y be the number of vertices of these p vertices selected in step 8 that belong in V_k (with multiplic- ity counted). Then Pr[ event B ] ≤ Pr [ y < θh ]. We estimate the upper bound of the above probability by the Chernoff bound. As the probability of selecting a vertex in V_k is k/N = λ and the total number of selections is p = 6θh/λ, we have µ = E[ y ] = (6θh/λ)λ = 6θh. Rewrite Pr[ event B ] = Pr [ y < (1 − ∆)6θh ], where

∆ = 5/6. By the Chernoff bound, Pr[ event B ] ≤ e^−µ∆²^/2= e^−6θh(5/6)²^/2 = e^−25θh/12. Since θh > 1, Pr[ event B ] < e⁻² < 1/6. Hence the probability that we select an h-set from V_kin step 6 that leads to acceptance in step 12 is at least (1−1/6)(1−1/6) > 2/3.

The probability that a digraph G which has property P_k,H will be rejected is thus less than 1/3. See Fig. 5.2 for illustration.

On the other hand, suppose the input graph G = (V, E) is ²-far from any digraph which has property P_k,H. Obviously, the probability that the algorithm accepts is equal to the probability that we find an h-set S whose induced subgraph does not contain an H, and after we randomly select p additional vertices (with replacements), there exists a (θh)-multiset S_j from those p selected vertices such that the induced subgraph of S_j ∪ S contains no copies of H. By Claim 15, the probability of finding an h-set that contains copies of H is at least f (²; H)h!/h¡(^h2)

m

¢λ i

. For each (θh)- multiset S_j, at least θ disjoint h-sets are checked; hence the probability that S_j∪ S

(41)

contains copies of H is at least Ã

f (²;H)h!

((^h²)

m)^λ

!_θ

= (λ/6)^θ · (2/3)^1/κ. We then test ¡_p

θh

¢ (θh)-multisets in step 12. Since

µp θh

¶

= (6θh/λ)!

(θh)! = (6θh/λ)[(6θh/λ) − 1] · · · [(6θh/λ) − θh]

(θh)! > (6/λ)^θh > (6/λ)^θ, the probability that the induced subgraph of Sj∪ S contains copies of H for all j is at least (6/λ)^θ· (λ/6)^θ· (2/3)^1/κ = (2/3)^1/κ. So, for each h-set S that passes the test in step 7, the probability that S does not lead to acceptance in step 12 is at least (2/3)^1/κ. Hence, regardless whether S passes the test in step 7, the probability that none of the S leads to acceptance in step 12 is at least

h

(2/3)^1/κ i_κ

= 2/3. Therefore, the probability that the algorithm accepts the input is less than 1/3.

The query complexity of step 7 is O (h²) and the query complexity from step 9 to step 10 is O¡¡_p

θh

¢¡_θ

2

¢¢. Since ¡_p

θh

¢¡_θh

2

¢ > h², the query complexity is O¡ κ¡_p

θh

¢¡_θh

2

¢¢. This value is independent of N. Hence the theorem follows. Q.E.D.

The value of f (²; H) decreases extremely fast with ², and is independent of n [7]. Although it is difficult to compute the exact value of f (²; H) in general, we can estimate a lower bound of f (²; H) by Szemer´edi’s regularity lemma, and [(1 − ²)/(2 + h)]^h is one such lower bound. In our algorithm in Fig. 5.1, f (²; H) is just a coefficient. The soundness of our algorithm in Fig. 5.1 is proved in Theorem 16. We can replace f (²; H) by [(1 − ²)/(2 + h)]^h in step 4 of our algorithm in Fig. 5.1 without changing the validity of Theorem 16. The consequence is that our algorithm needs to query more edges in the input digraph, but the total number of queried edges remains independent of the input size.

(42)

· · · The input digraph

Randomly select anh-set

If theh-set is H-free

Test if the induced subgraph of the union between the (θh)-set and the h-set isH-free

h-set Randomly select 6/λ (θh)-sets

Figure 5.2: Testing the property P_k,H.

(43)

Let k > √

²N, ² be a constant and H1 = (V1, E1) be a digraph where V1 = {v₁, v₂, . . . , v_d+1} and E₁ = {(v₁, v_d+1), (v₂, v_d+1), . . . , (v_d, v_d+1)}. It is commonly called a star graph. We can use the algorithm in Fig. 5.1 to test whether the input digraph contains an H1-free k-induced subgraph. Obviously if a digraph is accepted by our algorithm, then the maximum indegree of this digraph is bounded by d with high probability. Similarly, let H₂ = (V, E₂) be a digraph where E₂ = {(vd+1, v1), (vd+1, v2), . . . , (vd+1, vd)}. We can use the algorithm in Fig. 5.1 to test whether the maximum outdegree of the input digraph is bounded by d with high probability. If an input digraph is accepted by our algorithm for both H₁ and H₂, then we know that this digraph satisfies the restrictions of the BR tester. In this case, we use the BR tester to test strong connectivity of the input digraph. The total query complexity of testing strong connectivity is the sum of the query complexities of the algorithm in Fig. 5.1 and the BR tester. Since the query complexities of the algorithm in Fig. 5.1 and the BR tester are both independent of the input size, the sum of the query complexities of both algorithms remains independent of the input size. The query complexity of our strong connectivity tester in Fig. 4.2 is the square root of the input size. Hence, the sum of the query complexities of the algorithm in Fig. 5.1 and the BR tester is less than the query complexity of our strong connectivity tester in Fig. 4.2. Since the main efficiency parameter of a method to solve a property testing problem is its query complexity, our strong connectivity tester is not the most efficient one for all digraphs. It is better to use the algorithm in Fig. 5.1 to determine which tester (the BR tester or our strong connectivity tester) should

(44)

be used to test the strong connectivity of the digraph.

有向圖與阿貝爾群的性質測試 Property Testing on Directed Graphs

國立臺灣大學電機資訊學院資訊工程學系 博士論文

Department of Computer Science and Information Engineering College of Electrical Engineering and Computer Science

National Taiwan University Doctoral Dissertation

有向圖與阿貝爾群的性質測試 Property Testing on Directed Graphs

and Abelian Groups

狄彥吾 Yen-Wu Ti

指導教授：呂育道 博士

Advisor: Yuh-Dauh Lyuu, Ph.D.

中華民國 98 年 6 月

致 謝

摘 摘 摘

摘 要 要 要 要

Contents

Chapter 1 Introduction

Chapter 2 Background

2.1 Question of property testing

2.2 Property testing on combinatorial objects

2.3 Property testing and learning theory

Chapter 3

Testing of Digraph Properties

3.1 Property testing on digraphs

3.2 Research work related to graph property test- ing

3.3 Reduction between group properties and di- graph properties

Chapter 4

Testing Strong Connectivity on Digraphs

4.1 Strongly connected component

4.2 Tester construction

Chapter 5

Testing Whether a Digraph Contains H-free k-induced Subgraphs

5.1 Existence of H-free k-induced subgraphs is Ω(N

)- evasive

5.2 Tester construction

國立臺灣大學電機資訊學院資訊工程學系博士論文

指導教授：呂育道博士

致謝

摘摘摘

摘要要要要