## 國立臺灣大學電機資訊學院資訊工程學系 博士論文

### Department of Computer Science and Information Engineering College of Electrical Engineering and Computer Science

### National Taiwan University Doctoral Dissertation

## 有向圖與阿貝爾群的性質測試 Property Testing on Directed Graphs

### and Abelian Groups

## 狄彥吾 Yen-Wu Ti

## 指導教授：呂育道 博士

### Advisor: Yuh-Dauh Lyuu, Ph.D.

### 中華民國 98 年 6 月

## 致 謝

首先誠摯的感謝指導教授呂育道博士，老師悉心的教導使我得以一窺資訊科學領 域的深奧，不時的討論並指點我正確的方向，使我在這些年中獲益匪淺。老師對 學問的嚴謹更是我輩學習的典範。

七年裡的日子，實驗室裡共同的生活點滴，學術上的討論、言不及義的閒扯...，

感謝眾位學長、同學、學弟妹的共同砥礪，你/妳們的陪伴讓七年的研究生活變 得絢麗多彩。

感謝周立平、戴天時學長們不厭其煩的指出我研究中的缺失，且總能在我迷惘時 為我解惑，也感謝同學袁勤國、陳俊諺、張文彥、林敏順的幫忙，恭喜我們順利 走過這些年。實驗室的張中平、馬德文、張經略、林宏佑、陳絢昌學弟們當然也 不能忘記，各位的幫忙我銘感五內。

最後，謹以此論文獻給我已經離世的雙親。

## 摘 摘 摘

## 摘 要 要 要 要

性質測試 (property testing)是一個在資訊學科中被廣泛研究的課題，其應用 範圍涵蓋了網路拓樸與程式除錯等多個領域。Bender 與 Ron 發展了一個性質測 試的演算法，用來測試某些有向圖是否滿足強連通的性質，本論文將這個演算法 稱之為 BR 演算法。BR 演算法只適用於滿足特定條件的某些有向圖，是以在應用 上有諸多限制。本論文發展了一個不受限制的演算法，可以測試任何一個有向圖 是否具有強連通的性質。

相對於在應用上受到限制的 BR 演算法，本論文所發展用來測試強連通性質的演 算法，在應用上雖然沒有限制，但是相較於 BR 演算法之效率較差。因此對於滿 足 BR 演算法應用限制的少數有向圖，我們還是傾向用 BR 演算法來測試其是否有 強連通的性質；至於其他不滿足限制的有向圖，我們便使用本論文發展的演算法。

本論文接著發展一個演算法可以幫助我們在上述兩者間選擇合適的強連通測試 演算法。

對於一個事先設定的有向圖H，Alon 與 Shapira 證明了對於任一個有向圖G，如 果我們需要大量移除G的有向邊，才可以完全消除G裡面與H共構(isomorphic) 的子圖(subgraph)，則在有向圖G中的H共構子圖的數目有一個下界。對於一個 有向子圖，如果其圖形的任一部分都不與H形成共構，我們便稱之為無H子圖。

本論文利用 Alon 與 Shapira 的研究結果，發展了一個演算法，用以測試任一個 有向圖是否存在一個由k個點所組成的無H子圖。如前所述，當我們需要在兩個 強連通測試演算法之間做選擇的時候，這個測試無H子圖的演算法就可以幫助我 們選擇合適的強連通測試演算法。

本論文的最後一部分是利用本論文之前的強連通測試演算法，去發展一個關於群 性質的測試演算法。一個群的生成元可以用來生成整個群，而對於一個阿貝爾群 而言，其生成元的數目是一個倍受學界關注的研究題目。本論文利用前述的強連 通測試演算法，發展了一個很有效率的演算法，可以測試任一個集合與一個二元 運算的組合是否為一個生成元數目小於k的阿貝爾群。

關鍵詞：性質測試，隨機演算法，有向圖，強連通，阿貝爾群，生成元。

Abstract

Bender and Ron construct a restricted tester on the strong connectivity of di- graphs (we call it the BR tester). We generalize the BR tester to test the strong connectivity of digraphs.

*For any digraph H and a digraph G being far from any H-free digraph, Alon and*
*Shapira prove a lower bound of the number of H in G. After solving the problem*
of testing the strong connectivity of digraphs, we use Alon and Shapira’s result to
*construct a randomized algorithm for testing digraphs with an H-free k-induced*
subgraph.

Our strong connectivity tester has no restriction but must query about the input
more times than the restricted BR tester. Suppose an input digraph satisfies the
restrictions of the BR tester, using the BR tester to test the strong connectivity
of this input digraph is more efficient than using our strong connectivity tester. If
we want to test the strong connectivity of a digraph, our randomized algorithm for
*testing digraphs with an H-free k-induced subgraph can help us determine which*
tester should be used to test the strong connectivity of the digraph: the BR tester
or our strong connectivity tester.

A generator set for a finite group is a subset of the group elements such that repeated multiplications of the generators alone can produce all the group elements.

The number of generators of an abelian group is an important issue in many studies.

In most cases, it is not easy to identify whether a group-like structure is an abelian
*group with k generators for a constant k. We construct an efficient randomized*
algorithm that, given a finite set with a binary operation, tests if it is an abelian group
*with a k-generator set. If k is not too large, the query complexity of our algorithm*
is polylogarithmic in the size of the groundset. Otherwise the query complexity is at
most the square root of the size of the groundset.

*Keywords: Property testing; Strong connectivity; H-free subgraph; Abelian*
group; Generator.

## Contents

1 Introduction 1

2 Background 5

2.1 Question of property testing . . . 5 2.2 Property testing on combinatorial objects . . . 6 2.3 Property testing and learning theory . . . 6

3 Testing of Digraph Properties 8

3.1 Property testing on digraphs . . . 8 3.2 Research work related to graph property testing . . . 9 3.3 Reduction between group properties and digraph properties . . . 10

4 Testing Strong Connectivity on Digraphs 12

4.1 Strongly connected component . . . 12 4.2 Tester construction . . . 16

*5 Testing Whether a Digraph Contains H-free k-induced Subgraphs 20*
*5.1 Existence of H-free k-induced subgraphs is Ω(N*^{2})-evasive . . . 20

5.2 Tester construction . . . 25

6 Testing of Group Properties 36

6.1 Finite group-like structure . . . 36 6.2 Research work related to group property testing . . . 37 6.3 Tester construction . . . 38

7 Conclusion 47

Bibliography . . . 48

## Chapter 1 Introduction

This world is full of decision problems, and we need to make decisions every day. In computer science, a decision problem asks if an object has a predetermined property.

Unfortunately, sometimes no fast algorithms exist that give the exact answer. In these cases, an approximate answer within a reasonable complexity is an attractive alternative.

*A property-testing algorithm offers such answers. For a fixed property P and any*
*object O, the property-testing algorithm determine whether O has property P , or*
*whether O is far from having property P (i.e., far from any other object having P ).*

It is, however, arbitrary on objects falling between the two categories. For example, the object can be a graph and the property can be 3-colorabilty. The task should be performed by querying the object (in as few places as possible). In the example, what we query is the existence of edges between two vertices.

Many recent research results concern the testing of graph properties and group properties. In computer science, the general notion of property testing is first formu-

lated by Rubinfeld and Sudan [64]. In their formulation, a property testing algorithm
*for property P is given oracle access to the tested object. Distance between instances*
is measured in terms of the fraction of arguments in the domain.

Property testing emerges naturally in the context of program checking and proba- bilistically checkable proofs (PCP). Specifically, in the context of program checking, one may choose to test if the program satisfies certain properties before checking that it computes a specified function. This paradigm has been followed both in the theory of program checking [22, 64], and in practice where often programmers first test their programs by verifying that the programs satisfy properties that are known to be satisfied by the functions they compute. In the context of probabilis- tically checkable proofs, the property tested is being a codeword with respect to a specific code. This paradigm, explicitly introduced in Babai, Fortnow, Levin and Szegedy’s result, has shifted to testing Hadamard codes, and then to testing the long code [10, 13, 15, 16, 43, 44, 54, 68]. All of these papers have focused on property test- ing of algebraic properties such as linearity, multi-linearity and being a low-degree polynomial.

The number of generators of a group is an important issue in many studies.

Knowing the number of generators of a group leads to a deeper understanding to the structure of a group. It may help us to discover the features of the groups quickly.

A group in which every element commutes with its endomorphic images is called an
*E-group. A generator set with size k is called a k-generator set and a group for which*
*the elements commute is called an abelian group. We know that every E-group with*

*a 2-generator set is abelian and all E-groups with a 3-generator set are nilpotent of*
class at most 2 [2]. We also know that we need at least four generators to generate
*a finite non-abelian E-group [1]. A group with a 2-generator set must isomorphic*
to a proper factor group [61]. The number of generators of a group has also been
intensively studied [24, 25, 26, 48, 55, 58, 59].

This dissertation presents a method to combine the testing algorithms of digraphs
*and groups to test whether a group-like structure is an abelian group with a k-*
generator set. The first part of this dissertation is testing whether a digraph is
strongly connected. Bender and Ron construct a restricted tester on the strong
connectivity of digraphs (we call it the BR tester) [17]. There are some instances
that do not satisfy the restrictions of the BR tester. We generalize the BR tester to
test the strong connectivity of digraphs.

*For any digraph H and a digraph G being far from any H-free digraph, Alon and*
*Shapira prove a lower bound of the number of H in G. After solving the problem*
of testing the strong connectivity of digraphs, we use Alon and Shapira’s result to
*construct a randomized algorithm for testing digraphs with an H-free k-induced*
subgraph.

Our strong connectivity tester has no restriction but must query about the input more times than the restricted BR tester. Suppose an input digraph satisfies the restrictions of the BR tester, using the BR tester to test the strong connectivity of this input digraph is more efficient than using our strong connectivity tester. If we want to test the strong connectivity of a digraph, our randomized algorithm for

*testing digraphs with an H-free k-induced subgraph can help us determine which*
tester should be used to test the strong connectivity of the digraph: the BR tester
or our strong connectivity tester.

It is not easy to identify whether a group-like structure is an abelian group with
*a k-generator set for any given constant k. In the last part of this dissertation, we*
combine the testing algorithm for the abelian property of groups and the testing
algorithm for the strong connectivity of digraphs to form the testing algorithm for a
*group-like structure being an abelian group with a k-generator set. Our method is*
to use the strong connectivity of Cayley graphs to test if a finite group-like structure
*has a k-generator set. Before that, we should test if the input group-like structure is*
an abelian group. Friedl, Ivanyos and Santha construct a tester which, given a finite
group-like structure, tests if it is an abelian group (we call it the FIS tester) [37].

Combining the FIS tester and our strong connectivity tester, we can construct a
*testing algorithm for a group-like structure being an abelian group with a k-generator*
set.

## Chapter 2 Background

### 2.1 Question of property testing

We are interested in the following question of property testing:

*Let Π be a fixed property, and t be an instance. Our goal is to determine*
*(possibly probabilistically) if t has property Π or if it is far from any*
instance that has property Π, where the distance between instances is
measured with respect to the uniform probability distribution on the
*domain of t. Towards this goal, we are allowed to select some elements*
*from t and query a specific information about t on elements of our choice.*

*Let T be the class of instances that satisfy property Π. Then, testing property Π*
*corresponds to testing membership in the class T . The two most relevant parameters*
*to property testing are the distance, hereafter denoted ², and the desired confidence,*
*denoted p. We require the tester to accept each instance in T and reject every*
*instance that is more than ² away from any instance in T . We allow the tester to be*

probabilistic, and make incorrect positive and negative assertions with probability
*at most p. The complexity measures we focus on are the query complexity (the*
number of queries made). We believe that property testing is a natural notion
whose relevance to applications goes beyond program checking, and whose scope
goes beyond the realm of testing algebraic properties.

### 2.2 Property testing on combinatorial objects

Working within the above framework, we venture into the domain of combinato- rial objects. In particular, we study testing group properties and graph properties, and demonstrate its relevance to the notions of approximation. We hope to derive extremely efficient algorithms for testing natural properties.

We only consider the uniform probability distribution on these combinatorial objects, as well as algorithms that only obtain random samples.

### 2.3 Property testing and learning theory

Our formulation of testing mimics the standard frameworks of learning theory. Sup-
pose the property Π is a set of functions. In both property testing and learning
*theory, one is given access to an unknown target function f . However, there are two*
important differences between them. First, the goal of a learning algorithm is to
*find a good approximation to the target function f ∈ Π, whereas a testing algorithm*
should only determine whether the target function is in Π or far away from it. This
makes the task of testing seem easier than that of learning. But that is misleading

because a learning algorithm should perform well only when the target function be- longs to Π, whereas a testing algorithm must perform well in such cases as well as on functions far away from Π.

Goldreich, Goldwasser and Ron show that the relation between learning and testing is nontrivial. On one hand, proper learning (i.e., when the hypothesis of the learning algorithm must belong to the same class as the target function) implies testing. On the other hand, there are function classes for which testing is harder than (nonproper) learning (i.e., when the hypothesis is not required to belong to the same class as the target function), provided NP * BPP [41].

## Chapter 3

## Testing of Digraph Properties

### 3.1 Property testing on digraphs

We define property testing for digraphs next. Let Π be a property of digraphs, that
*is, a family of digraphs closed under isomorphism. A digraph G = (V, E) is ²-close*
*to having property Π if there exists a digraph G*^{0}*= (V, E** ^{0}*) having property Π such

*that the symmetric difference between E and E*

^{0}*is at most ²*¡

_{|V |}2

¢. We say that a
*digraph G is ²-far from having property Π if it is not ²-close to having property Π.*

*An ²-tester (or simply a tester) for a digraph property Π is a randomized algorithm*
*that is given a size parameter N, a distance parameter ² and the ability to make*
*queries as to whether a directed edge of the input digraph G with N vertices exists.*

*The total number of queries is called the query complexity of the tester. Let {g*_{i}*}*
*be the set of digraphs with N vertices that satisfy Π. The algorithm needs to*
*distinguish with probability at least 2/3 between the case of G having Π and the*
*case of G differing from any g*_{i}*in at least ²*¡_{N}

2

¢ *edges [7]. In the latter case, G is*

*said to be ²-far from property Π. More specifically, T is an ²-tester for Π if for every*
*G = (V, E) and every ², the following two conditions hold:*

(1) *if G has property Π, then Pr[ T accepts G ]≥ 2/3;*

(2) *if G is ²-far from having property Π, then Pr[ T accepts G ]≤ 1/3.*

The probability 2/3 can be replaced by any constant smaller than 1 because the
algorithm can be repeated if necessary. A graph property is testable if the property
*has a tester and the total number of queries is o(N*^{2}).

### 3.2 Research work related to graph property test- ing

A testing algorithm for graph property Π can make queries on the incidence re-
*lations of vertices in an input graph G. Property Π is Ω(N*^{2})-evasive if there is
*no deterministic testing algorithm with query complexity o(N*^{2}) that can correctly
decide if the input has Π. Holt and Reingold are the first to consider the complex-
ity of recognizing graph properties from their adjacency matrix representations [47].

They show that the graph properties of connectivity and the existence of cycles
*are both Ω(N*^{2})-evasive. An important open problem in this area is the Aanderaa-
Rosenberg conjecture: Every nontrivial monotone graph property without self-loops
is ¡_{N}

2

¢-evasive [23, 29, 47, 53]. Rivest and Vuillemin resolve a weaker version of the
Aandreaa-Rosenberg conjecture [63]. The weaker version says that every nontrivial
*monotone graph property has decision tree complexity Ω(N*^{2}).

The testing of graph properties is pioneered by Goldreich, Goldwasser and Ron
[41]. They showe that all graph properties describable by the existence of a partition
*of a certain type are testable. For a fixed digraph H with at least one edge, let P*_{H}*denote the property of the input digraph being H-free. In other words, the digraph*
*G has P*_{H}*if and only if it contains no subgraphs isomorphic to H. Alon and Shapira*
*prove that P** _{H}* is testable with a total number of queries bounded only by a function of

*², independent of N [7]. This result has been improved later by Alon and Shapira [6].*

Alon, Fischer, Krivelevich and Szegedy show that every first-order undirected graph
*property without a quantifier alternation of type ”∀ ∃” has ²-testers whose query*
complexity is independent of the size of the input digraph [4]. More recently, Alon,
Fischer, Newman and Shapira prove a very general result for undirected graphs,
which says that the property defined by having any given Szemer´edi-partition is
testable with a constant number of queries [5]. Moreover, a purely combinatorial
characterization of the graph properties is testable with a constant number of queries.

The testing of other graph and combinatorial properties has also been intensively studied [30, 33, 40, 42, 51].

### 3.3 Reduction between group properties and di- graph properties

In this section, we define Cayley graphs and introduce the reduction between group properties and digraph properties.

Definition 1 *([46]) Let G be a group, ◦ be the group multiplication and S be a*

*subset of the group’s elements not containing the identity element. The Cayley graph*
*associated with S is defined as the digraph having one vertex associated with each*
*group element G and directed edges (g, h) whenever g ◦ h*^{−1}*∈ S.*

The properties of Cayley graphs have been extensively studied in graph theory.

These properties are used to develop algebraic settings for studying certain structural and algorithmic properties of the interconnection networks that underlie parallel architectures, including the hypercube, butterfly, cube-connected cycles, multiple rings and star networks [3, 8, 21, 27, 65]. Cayley graphs have also been used to study the gossiping problem in communication networks [20].

We say that a digraph is strongly connected if there is a directed path from every
*vertex u in the digraph to every other vertex v.*

Theorem 2 *([9]) The Cayley graph associated with a subset of a group’s elements*
*(but not containing the identity element) is strongly connected iff the subset generates*
*the group.*

Our method relies on the strong connectivity of Cayley graphs to test if a finite
*group-like structure s = (Γ, ◦, i, 1) has a k-generator set. A subset of a groundset*
*or a vertex set with size k will be called a k-subset from now on. By Theorem 2,*
*for an input group-like structure s = (Γ, ◦, i, 1), if we can test whether there exists*
*a k-subset of Γ with a corresponding strongly connected Cayley graph, then we can*
*test whether s has a k-generator set. In the next chapter, we develop an algorithm*
for testing strong connectivity of digraphs.

## Chapter 4

## Testing Strong Connectivity on Digraphs

### 4.1 Strongly connected component

In this chapter, we develop an algorithm for testing strong connectivity of digraphs
and we will rely on the strong connectivity of Cayley graphs to test if a finite group-
*like structure s = (Γ, ◦, i, 1) has a k-generator set in the following chapter. For*
*a digraph G = (V, E) with indegree and outdegree bounded by d < |V |, Bender*
and Ron develop a tester on the strong connectivity of digraphs (we call it the BR
tester) [17].

Theorem 3 *([17]) (1) If G is ²-far from being strongly connected with indegree and*
*outdegree bounded by d, then the BR tester will reject it with probability at least 2/3.*

*(2) The BR tester has one-sided error. (3) The query complexity is O(1/²).*

On the other hand, suppose there is no bound on the indegree and outdegree
*of the digraph G. We construct another tester to test the strong connectivity of*

digraphs in this chapter that is slightly different from the BR tester. To begin with,
*testing strong connectivity is trivial when the distance parameter is greater than 2/n*
for the following reason: We can always make a digraph connected by adding at most
*n − 1 edges. Hence every digraph with n vertices is (2/n)-close to being strongly*
*connected because (2/n)*¡_{n}

2

¢ *= n − 1. On the other hand, ² should be greater than*
*1/*¡_{n}

2

¢ *for, otherwise, ²*¡_{n}

2

¢ *< 1. To make sure that the problem is not trivial, we*
*assume 1/*¡_{n}

2

¢*< ² < 2/n from now on.*

*A strongly connected component of a digraph G = (V, E) is a maximal subgraph*
*H = (V*^{0}*, E*^{0}*) such that there is a directed path from each vertex p ∈ V** ^{0}* to every

*other vertex q ∈ V*

^{0}*. Denote the set of strongly connected components of G by C =*

*{C*

_{1}

*, C*

_{2}

*, . . . , C*

_{m}*}. For vertices u ∈ C*

_{i}*and v ∈ C*

_{j}*, i 6= j, such that e = (u, v) ∈ E,*

*we call e an outgoing edge of C*

_{i}*and an incoming edge of C*

_{j}*. We call a strongly*connected component a source if it has only outgoing edges, a sink if it has only incoming edges, an isolation if it has neither outgoing nor incoming edges, and a transferrer if it has both outgoing and incoming edges.

Lemma 4 *If a digraph G with n vertices is ²-far from being strongly connected, then*
*the total number of sources, sinks, and isolations in G exceeds ²n*^{2}*.*

Proof: Assume the claim of the lemma is wrong and proceed to obtain a con-
*tradiction. Let set T contain all transferrers and set R consisting of the remaining*
strongly connected components. We divide the problem up into two cases.

Case 1. *T is empty.*

*A strongly connected component in G is either a source, a sink, or an isola-*

*tion, and hence a member of R. Pick a vertex from each strongly connected*
*component of G. As G has |R| strongly connected components, |R| vertices*
*are chosen. Turn these |R| vertices into a directed cycle by adding at most |R|*

*directed edges to G. Now, for any ordered pair of vertices (u, v) in V , there*
*is a directed path from u to v; hence G is strongly connected. Recall that an*

*²-far digraph differs from the class of strongly connected digraphs by at least*

*²n*^{2}*+ 1 edges. However this is a contradiction, since |R| ≤ ²n*^{2}. See Fig. 4.1 for
illustration.

Case 2. *T is not empty.*

*A chain of strongly connected components (C*1*, C*2*, . . . , C**n*) consists of strongly
connected components such that between any two adjacent strongly connected
*components C*_{i}*and C*_{i+1}*, there is a directed edge from a vertex of C** _{i}* to a vertex

*of C*

*.*

_{i+1}*For any T ∈ T , there is a longest chain of strongly connected components S in*
*T containing T (i.e., all members of S are transferrers, and T is one of them).*

*Let S*^{h}*be the head of the chain S. S** ^{h}* is a strongly connected component with

*both incoming and outgoing edges because S*

^{h}*∈ T . There is no directed edge*

*from other strongly connected components in T to S*

^{h}*for, otherwise, S would*

*not be the longest chain containing T . Therefore, all the incoming edges of*

*S*

*are outgoing edges of some sources. By the same token, all the outgoing*

^{h}*edges of the tail S*

^{t}*of S are incoming edges of some sinks. Consequently,*

*for any T ∈ T , we can always find a vertex p from a source and a vertex q*

Strongly connected component

Strongly connected component

Strongly connected component

Choose a vertex from the strongly connected component

Choose a vertex from the strongly connected component Choose a vertex

from the strongly

connected component Turn these vertices into a directed cycle

Figure 4.1: Testing strong connectivity of digraphs.

*from a sink such that there is a directed path from p to q passing through a*
*vertex of T . As a result, if all the sources and sinks are turned into one single*
strongly connected component, the transferrers will become part of the same
component, too.

*Therefore, if we can turn all members of R into one single strongly connected*
*component R*^{0}*, then for all T**i**, T**j* *∈ T and all vertices x ∈ T**i* *and y ∈ T**j*,
*there exists a directed path from x through some vertices in R*^{0}*to y. So we*
*can ignore the transferrers and concentrate on how to make all members of R*
become one single strongly connected subgraph. We have thus reduced this
*case to case 1, where T is empty. The number of directed edges we need to*
*add to G is thus at most |R|, too. If |R| is no greater than ²n*^{2}*, we get a*
contradiction. *Q.E.D.*

Lemma 5 *A digraph ²-far from being strongly connected must have at least (²n*^{2}*)/2*
*connected components each containing fewer than d*_{n}^{1}(^{2}_{²}*− 1)e vertices.*

Proof: Assume the claim is wrong. Lemma 4 says that this digraph has at least

*²n*^{2} + 1 strongly connected components. Since every strongly connected component
has at least one vertex,

*n ≥* ¡

*²n*^{2}*+ 1 − (²n*^{2}*)/2*¢

*·*

»1
*n*

µ2

*²* *− 1*

¶¼
+ *²n*^{2}

2 *· 1*

*>* *²n*^{2}
2

1
*n*

µ2

*²* *− 1*

¶
+*²n*^{2}

2

= *²n*
2

µ2

*²* *− 1*

¶
+*²n*^{2}

2

*= n −²n*
2 +*²n*^{2}

2

a contradiction. *Q.E.D.*

### 4.2 Tester construction

For a digraph with no bound on the indegree and outdegree of each vertex, the
algorithm that tests whether it is strongly connected appears in Fig. 4.2 (we call it
the CONN tester). It is obviously that testing strong connectivity is trivial when
*the distance parameter is greater than 2/n. To make sure that the problem is not*
*trivial, we assume 1/*¡_{n}

2

¢*< ² < 2/n. The following theorem analyzes the algorithm.*

Theorem 6 *(1) Our algorithm has one-sided error. (2) Suppose G is ²-far from*
*being strongly connected, then our algorithm will reject it with probability at least*
*2/3. (3) The query complexity is O*³_{log}

*1− ²n*2 *1/3*

*²*

´
*.*

1: *Let S = ∅, m = dlog*_{1−(²n)/2}*1/3e, x = d*^{1}* _{n}*(

^{2}

_{²}*− 1)e*

2: *while | S |< m do*

3: *Pick an arbitrary vertex u from V and add it to S*

4: *Perform BFS on G starting from u and always use a visited vertex’s incoming*
*edges and stop when we reach x vertices*

5: if we run out of new vertices then

6: REJECT

7: else

8: *Perform BFS starting from u and always use a visited vertex’s outgoing*
*edges and stop when we reach x vertices*

9: if we run out of new vertices then

10: REJECT

11: end if

12: end if

13: end while

14: ACCEPT

*Figure 4.2: An ²-tester for the strong connectivity of digraphs in the case that there*
is no prior bound on the indegree and outdegree of each vertex.

*Proof: Since 1/*¡_{n}

2

¢*< ² < 2/n, x ≤ n. For a strongly connected digraph G, given*
*an arbitrary vertex u of V , a BFS always reaches at least x vertices by starting from*
*u. In other words, our algorithm never rejects G, a yes instance; hence it has only*
one-sided error.

*For a no instance G, according to Lemma 5, there are at least (²n*^{2}*)/2 strongly*
*connected components, each containing fewer than x vertices. The probability that*
*an arbitrarily chosen vertex belongs to one of these (²n*^{2}*)/2 strongly connected com-*
ponents is at least ^{(²n}_{n}^{2}^{)/2}*= (²n)/2. Our algorithm outputs ACCEPT for a no*
*instance only when the while-loop executes m times. The probability is at most*
*[1 − (²n)/2]*^{m}*= [1 − (²n)/2]*^{dlog}^{1−(²n)/2}^{1}^{3}^{e}*≤ 1/3. Hence, G is rejected with probability*
*at least 2/3.*

*Finding all of a vertex’s incoming (outgoing) neighbors takes at most n queries*
on the incoming (outgoing, respectively) edges given the adjacency matrix. Since
*the while-loop is repeated at most m times, the query complexity is bounded by:*

*m · 2 ·*

»1
*n*

µ2

*²* *− 1*

¶¼

*· n <*

·

log* _{1−(²n)/2}*1
3+ 1

¸

*·*

·2
*n*

µ2

*²* *− 1*

¶¸

*· n*

*= O*

Ãlog_{1−(²n)/2}^{1}_{3}

*²*

!

*.* *Q.E.D.*

According to Theorem 6, the query complexity of the CONN tester is much higher than the BR tester. Suppose the indegree and the outdegree of an input digraph are both bounded by a given constant, we can use the BR tester to test the strong connectivity of the input in order to reduce the query complexity; otherwise, we use the CONN tester. In the next chapter, we investigate the method to test whether a

*digraph contains H-free k-induced subgraphs. If we do not know the upper bounds*
of the indegree and the outdegree of an input digraph, our result in the next chapter
can help us determine what algorithm to use and then use that algorithm to test the
strong connectivity of the input.

## Chapter 5

## Testing Whether a Digraph *Contains H-free k-induced* Subgraphs

### 5.1 *Existence of H-free k-induced subgraphs is Ω(N*

^{2}

### )- evasive

In this section, we show that the query complexity of any deterministic algorithm
*for the existence of H-free k-induced subgraphs is Ω(N*^{2}).

*First, we need some results concerning Tur´an numbers. For any integer N and*
*a fixed graph H, let ex(N, H) denote the maximum number of edges that an N-*
*vertex graph may have if it contains no isomorphic copy of H. This is the Tur´an*
*number of H. Furthermore, we will denote by b** _{r,s}* the complete undirected bipartite

*graph between a set of r vertices and another set of s vertices. The following fact is*well-known.

Fact 7 *([54]) For r ≤ s, ex(N, b**r,s**) = O(N*^{2−(1/r)}*).*

*If we replace the undirected edges of b** _{r,s}* by directed edges with an arbitrary

*direction, a complete bipartite digraph d*

*results. The next theorem shows that it*

_{r,s}*is Ω(N*

^{2}

*)-evasive to determine if there is a d*

*r,s*

*-free k-induced subgraph in a digraph.*

*In our model, whenever an algorithm queries a pair of vertices x, y in the input graph,*
*it actually means that the algorithm queries the existence of edges (x, y) and (y, x)*
*simultaneously. For a set S, we say that a subset T ⊆ S is a k-subset of S if |T | = k.*

*Suppose a digraph G contains a subgraph isomorphic to a digraph H. Then we say*
*G contains a copy of H.*

Theorem 8 *For any constant ρ < 1, k < N/2 and any complete bipartite digraph*
*d**r,s**, no algorithm can determine whether a digraph contains a d**r,s**-free k-induced*
*subgraph with query complexity ≤ ρ*¡_{k}

2

¢ *if k = λN with λ being a constant.*

*Proof: Suppose there exists an algorithm A that determines if a digraph contains*
*a d*_{r,s}*-free k-induced subgraph with ρ*¡_{k}

2

¢ queries. For the rest of the proof, assume
*k = O(N) and k is large enough so that*

*(1 − ρ)*
µ*k*

2

¶

*≥ ex(k, b*_{r,s}*) = O(k*^{2−(1/r)}*).* (1)

*Start with a digraph G*_{1} *with N vertices that contains no copies of d** _{r,s}* (this is

*easy to construct). Let G*

_{1}

*be the input of A. Obviously, all k-induced subgraphs of*

*G*

_{1}

*are d*

_{r,s}*-free. Let G*

_{2}

*= (V*

_{2}

*, E*

_{2}

*) be a graph with N isolated vertices. Every time*

*A queries a pair of vertices x, y in G*

_{1}

*, we add that edge to G*

_{2}if there is an edge

*between them. When A stops, the resulting G*2 *has no k-induced subgraphs which*
*contain d*_{r,s}*, just like G*_{1}*. For those vertex pairs of G*_{1} *that are not queried by A, we*
*add an edge (but without the directions) to G*_{2}*. For each k-induced subgraph of G*_{2},
*at least (1 − ρ)*¡_{k}

2

¢ *undirected edges are added. According to Fact 7, every k-induced*
*subgraph of G*_{2} *must contain a copy of b** _{r,s}* with the undirected edges alone because
of Eq. (1).

*Now, we select a k-induced subgraph K*1 *in G*2 *and replace one copy of b**r,s* *in K*1

*by d*_{r,s}*. Let V*_{b,1}*be the vertex set of this copy of d*_{r,s}*, and define h = |V*_{b,1}*| = r + s.*

*For each subset of V*_{2} *with size k that contains V** _{b,1}*, its induced subgraph has a copy

*of d*

*too. There are ¡*

_{r,s}

_{N −h}*k−h*

¢ *such k-subsets of V that contain V*_{b,1}*. Let k/N = λ.*

*Recall that λ is a constant. Now, the ratio of the number of all such k-subsets to*
*the number of k-induced subgraphs of G*_{2} is¡_{N −h}

*k−h*

¢*/*¡_{N}

*k*

¢. Note that

¡_{N −h}

*k−h*

¢

¡_{N}

*k*

¢ = *k(k − 1) · · · (k − h + 1)*
*N(N − 1) · · · (N − h + 1).*

*As h is a constant and k < N/2, it is not hard to prove that there is a number m > 0*
*such that for every N > m it holds that*

*k*

*N* *>* *k − 1*

*N − 1* *> · · · >* *k − h + 2*

*N − h + 2* *>* *k − h + 1*
*N − h + 1*

= *(k/N) − (h/N) + 1/N*

*1 − (h − 1)/N* *>* *λ*
*1 + λ.*
*Thus if N is large enough,*

¡_{N −h}

*k−h*

¢

¡_{N}

*k*

¢ = *k(k − 1) · · · (k − h + 1)*
*N(N − 1) · · · (N − h + 1)* *>*

µ *λ*

*1 + λ*

¶_{h}*.*

We conclude that at least ¡ _{λ}

*1+λ*

¢* _{h}*¡

_{N}*k*

¢ *k-induced subgraphs contain a copy of d**r,s*.
*Next we select another k-induced subgraph K*_{2} *= (V*^{0}*, E*^{0}*) with V*^{0}*∩ V*_{b,1}*= ∅. It*
*is worth noting that K*_{2} *also has a copy of b*_{r,s}*, and the vertex set of b*_{r,s}*is V** _{b,2}*. Like

*what we did before, we replace this copy of b*

*r,s*

*in K*2

*by d*

*r,s*. There are ¡

_{N −2h}*k−h*

¢ such
*k-subsets of V that contain V*_{2}*. The ratio of the number of all such k-subsets to the*
*number of k-induced subgraphs of G*_{2} is ¡_{N −2h}

*k−h*

¢*/*¡_{N}

*k*

¢*. Again, for N large enough,*

*N →∞*lim

¡_{N −2h}

*k−h*

¢

¡_{N}

*k*

¢ = lim

*N →∞*

¡_{N −h}

*k−h*

¢

¡_{N}

*k*

*¢ >*

µ *λ*

*1 + λ*

¶_{h}*.*

*We claim that in general, for every constant i,*

¡_{N −ih}

*k−h*

¢

¡_{N}

*k*

¢ = *k(k − 1) · · · (k − h + 1)*

*N(N − 1) · · · (N − h + 1)* *·* *(N − k) · · · (N − k − (i − 1)h + 1)*
*(N − h) · · · (N − ih + 1)*

*>*

µ *λ*

*1 + λ*

¶_{h}

*.* (2)

To verify this, recall that as we showed before,
*k(k − 1) · · · (k − h + 1)*
*N(N − 1) · · · (N − h + 1)* *>*

µ *λ*

*1 + λ*

¶_{h}*.*

As for

*(N − k) · · · (N − k − (i − 1)h + 1)*
*(N − h) · · · (N − ih + 1)* *,*
since

*N − k*

*N − h* *>* *N − k − 1*

*N − h − 1* *> · · · >* *N − k − (i − 1)h + 1*
*N − ih + 1*

we have

*(N − k)(N − k − 1) · · · (N − k − (i − 1)h + 1)*
*(N − h)(N − h − 1) · · · (N − ih + 1)* *>*

µ*N − k − (i − 1)h + 1*
*N − ih + 1*

¶_{(i−1)h}

*Now, with k = λN, it is easy to see that*

*N − k − (i − 1)h + 1*

*N − ih + 1* *>* *N − k − (i − 1)h + 1*

*N* *> (1 − 2λ)*

*where the last inequality is due to k > (i − 1)h − 1. Hence, when we repeat the above*
*process i times, at least*

*[(1 − 2λ) + (1 − 2λ)*^{2}*+ · · · + (1 − 2λ)** ^{(i−1)h}*]

µ *λ*

*1 + λ*

¶* _{h}*µ

*N*

*k*

¶

(3)

*k-induced subgraphs contain a copy of d*_{r,s}*. Recall that k < N/2. Hence 2λ < 1 and*
formula (3) is less than _{2λ}^{1} ¡ _{λ}

*1+λ*

¢* _{h}*¡

_{N}*k*

¢*.*

*Since λ, h and (λ/(1+λ))*^{−h}*are constants, we can repeat this process 2λ(λ/(1 + λ))*^{−h}*times such that V*_{b,i}*∩ V*_{b,j}*= ∅ for i 6= j and N large enough. After having repeated*
*this process that many times, we select 2λ(λ/(1 + λ))*^{−h}*h < N distinct vertices from*
*V for N large enough, and, by Eq. (2), the ratio of the number of K**2λ(λ/(1+λ))*^{−h}

*to the number of all k-induced subgraphs of G*_{2} *will be at least 2λ(λ/(1 + λ))** ^{−h}*.

*The number of k-induced subgraphs that contain a copy of d*

*then is at least*

_{r,s}*2λ(λ/(1 + λ))*

*−h (λ/(1+λ))*

^{h}*2λ*

¡_{N}

*k*

¢ = ¡_{N}

*k*

¢. In other words, after we repeat this process
*2λ(λ/(1 + λ))*^{−h}*times and remove the remaining undirected edges, all k-induced*

*subgraphs of G*2 *will have a copy of d**r,s**. This digraph G*2 contains, therefore, no
*H-free k-induced subgraph. However, A cannot distinguish between G*_{1} *and G*_{2} be-
*cause we have only changed G*_{2}*’s unqueried edges. So, A will accept G*_{2}, which is a
contradiction. *Q.E.D.*

### 5.2 Tester construction

*Fix a digraph H with h vertices and m ≥ 1 edges. Recall that P*_{k,H}*, where k ≥ h,*
*denotes the property that G contains an H-free k-induced subgraph. We will show*
*that property P** _{k,H}* is testable with a query complexity independent of the input size.

*A set with size n will be called an n-set, and a multiset with size n will be called an*
*n-multiset. There is a function f (²; H) with the following properties, which will be*
critical to our analysis later.

Theorem 9 *([7]) Let H be a fixed digraph with h vertices and D be a digraph with*
*N vertices. If at least ²N*^{2} *edges have to be removed from D to make it H-free, then*
*D contains at least f (²; H)N*^{h}*copies of H.*

The following corollary is immediate.

Corollary 10 *Let H be a fixed digraph with h vertices and m edges, D be a digraph*
*with N vertices and σ =*¡(* ^{h}*2)

*m*

¢*. If at least ²N*^{2} *edges have to be removed from D to*
*make it H-free, then D contains at least f (²; H)N*^{h}*/σ h-sets whose induced subgraphs*
*contain copies of H.*

*Suppose the input N-vertex digraph G = (V, E) is ²-far from having property*
*P*_{k,H}*. Corollary 10 tells us that G must contain at least f (²; H)N*^{h}*/*¡(* ^{h}*2)

*m*

¢*h-sets whose*
*induced subgraphs contain copies of H. So to test property P*_{k,H}*on G, our idea is*
*to randomly select many h-sets from V . Suppose G contains an H-free k-induced*
*subgraph, say (V*_{k}*, E*_{k}*). Then with enough h-sets from V , at least one of them is*
*expected to be a subset of V** _{k}* with high probability. To verify if this is the case,

*we will check if an h-set S satisfies S ⊆ V*

*k*in 2 steps. First, we check the induced

*subgraph of S. When S ⊆ V*

_{k}*, the induced subgraph of S contains no copies of H.*

*If the induced subgraph of S contains no copies of H, we randomly add a number*
*of other vertices to S (the number will be determined later) and check if there is a*
*subset of S (with a size to be determined later) whose induced subgraph contains no*
*copies of H. If S ⊆ V*_{k}*, we expect that S will pass both tests with high probability.*

*Thus, G will be accepted by our algorithm with high probability. On the other hand,*
*suppose G is ²-far from any digraph which has property P** _{k,H}*. Then we expect to

*find a copy of H in all the induced subgraphs of the above-mentioned h-sets S with*high probability. Our algorithm is detailed in Fig. 5.1.

We shall need the Chernoff bound in later analysis.

Theorem 11 *(Chernoff bound) Let X = X*_{1} *+ X*_{2} *+ · · · + X*_{n}*be a sum of n*
*independent random variables such that 0 < P r[ X*_{i}*= 1 ] < 1 holds for each i =*
*1, 2, ..., n and µ = E[X]. Then for any 0 < ∆ < 1,*

*Pr[ X < (1 − ∆)µ ] < e*^{−µ∆}^{2}^{/2}

1: *if k <√*

*²N then*

2: ACCEPT

3: end if

4: *let λ = k/N, κ = log*

*1−*^{(}

*√**²)h*
2

*(1/6), σ =*¡(* ^{h}*2)

*m*

¢ *and θ = max{log*^{6f (²;H)h!}

*σλ2*

*(2/3)*^{1/κ}*, 1}*

5: *for i = 1 to κ do*

6: *randomly select an h-set S from V*

7: *if the induced subgraph of S does not contain an H then*

8: *randomly select additional vertices p = 6θh/λ times (with replacements)*
*from V − S (assume these p vertices to be x*_{1}*, x*_{2}*, . . . , x*_{p}*) {note there are*

¡_{p}

*θh*

¢ *(θh)-multisets in {x*_{1}*, x*_{2}*, . . . , x*_{p}*}}*

9: *for j = 1 to* ¡_{p}

*θh*

¢ do

10: *let S*_{j}*be the jth (θh)-multiset selected in step 8*

11: *if the induced subgraph of S*_{j}*∪ S contains no copies of H then*

12: ACCEPT

13: end if

14: end for

15: end if

16: end for

17: REJECT

*Figure 5.1: The ²-tester for property P** _{k,H}*.

*where e is the base of the natural logarithm.*

*Note that in property P**k,H**, h is a constant. Hence f (²; H) is a function in ² only.*

*We assume that H is a fixed digraph with h vertices and m edges and recall that G*
*is the input digraph with N vertices from now on.*

Definition 12 *Let 0 < ² < 1, N, k ∈ N, λ = k/N, H be a fixed digraph with h*
*vertices, m be the number of edges in H, σ =*¡(* ^{h}*2)

*m*

¢*, κ = log*

*1−*^{(}

*√**²)h*
2

*(1/6) = Θ(1/²*^{h/2}*),*
*and θ = max{log**6f (²;H)h!*

*σλ2*

*(2/3)*^{1/κ}*, 1} = Θ(f (²; H)) when f (²; H) is only dependent on*
*1/². If the value of f (²; H) is large enough such that*

Ã

*f (²;H)h!*

((^{h}^{2})

*m*)^{λ}

!_{θ}

*≥ (λ/6)*^{θ}*(2/3)*^{1/κ}*,*
*then we say f (²; H) satisfies condition 1.*

Fact 13 *([7]) For a connected H, f (²; H) has a polynomial dependency on 1/² if*
*and only if the core of H is either an oriented tree or a directed cycle of length 2.*

*By Fact 13, f (²; H) has a polynomial dependency on 1/² for many H. Since the*
*value of f (²; H) is independent of h and m and* ¡_{2}

3

¢_{1/(θκ)}

*≤ 1, assuming f (²; H) =*
*O((1/²)*^{j}*), we can find a smaller ² = O*

"

((^{h}^{2})

*m*)

*h!* *λ*^{2}
6

¡_{2}

3

¢* _{1/(θκ)}*#

**

_{−j} such that

*f (²; H) ≥*

¡(* ^{h}*2)

*m*

¢
*h!*

*λ*^{2}
6

µ2 3

¶_{1/(θκ)}

i.e.,

Ã*f (²; H)h!*

¡(* ^{h}*2)

*m*

¢*λ*

!_{θ}

*≥ (λ/6)*^{θ}*(2/3)** ^{1/κ}*;

*hence f (²; H) satisfies condition 1.*

Claim 14 *Assume 0 < ² < 1, N, k ∈ N and k ≥* *√*

*²N. Suppose the input digraph*
*G = (V, E) with N vertices contains an H-free k-induced subgraph, say K = (V*_{k}*, E*_{k}*).*

*The probability of S ⊆ V*_{k}*for a random h-subset S ⊆ V is greater than (√*

*²)*^{h}*/2 for*
*N large enough.*

*Proof: The probability of S ⊆ V*_{k}*for a random h-set S is*

¡_{k}

*h*

¢

¡_{N}

*h*

¢ = *k(k − 1) · · · (k − h + 1)*
*N(N − 1) · · · (N − h + 1).*
*Since k ≥√*

*²N, the above probability is at least*

*√²N (√*

*²N − 1) · · · (√*

*²N − h + 1)*
*N(N − 1) · · · (N − h + 1)* *>* (*√*

*²)** ^{h}*
2

*for N large enough.*

*Q.E.D.*

Claim 15 *Let 0 < ² < 1, N, k ∈ N, λ = k/N, H be a fixed digraph with h vertices*
*and m be the number of edges in H. Suppose the input graph G = (V, E) with N*
*vertices is ²-far from any digraph having property P*_{k,H}*. The probability of finding an*
*h-set whose induced subgraph contains copies of H is at least f (²; H)h!/*h¡(* ^{h}*2)

*m*

¢*λ*
i

*.*

*Proof: By Corollary 10, each k-induced subgraph of G contains at least f (²; H)N*^{h}*/*h¡(* ^{h}*2)

*m*

¢i
*h-sets whose induced subgraphs contain copies of H. Therefore, by dividing V into*

*N/k k-sets, we can find at least*
h

*f (²; H)N*^{h}*/*¡(* ^{h}*2)

*m*

¢i*(N/k) = f (²; H)N*^{h}*/*h¡(* ^{h}*2)

*m*

¢*λ*
i

*h-sets whose induced subgraphs contain copies of H in G, and the probability of*

*finding an h-set whose induced subgraph contains copies of H is at least*

*f (²;H)N*^{h}

((^{h}^{2})

*m*)^{λ}

¡_{N}

*h*

¢ = *f (²; H)N*^{h}*·* ^{1}_{λ}*· h!*

*N(N − 1) · · · (N − h + 1)*¡(* ^{h}*2)

*m*

¢

*>* *f (²; H)h!*

¡(* ^{h}*2)

*m*

¢*λ* *.* *Q.E.D.*

*The following theorem proves the testability of P** _{k,H}*.

Theorem 16 *Let 0 < ² < 1, 0 < k < N be an integer and H be a fixed digraph.*

*If f (²; H) satisfies condition 1, the property P*_{k,H}*is testable with a query complexity*
*independent of the input size.*

*Proof: Suppose k <* *√*

*²N . Then the number of edges in a k-induced subgraph*
*is less than ²N*^{2}*. The input graph G, therefore, cannot be ²-far from any digraph*
*which has property P**k,H**, and we can simply accept it. Assume k ≥√*

*²N for the rest*
of the proof.

*Suppose the input digraph G = (V, E) contains an H-free k-induced subgraph,*
*say K = (V**k**, E**k**). The probability that the algorithm accepts G is at least the*
*probability of selecting a subset of V** _{k}* in step 6 of the algorithm in Fig. 5.1 and the

*tester accepts in step 12 for some j.*

*By Claim 14, the probability of S * V*_{k}*is at most 1 − (√*

*²)*^{h}*/2. As we inde-*
*pendently select κ h-sets S, the probability of S * V*_{k}*for all κ of them is at most*

£*1 − (√*

*²)*^{h}*/2*¤_{κ}

*= 1/6. Assume S ⊆ V*_{k}*from now on. We randomly select p other*
*vertices (with replacements) in step 8. Denote the jth such (θh)-multiset by S** _{j}*.

*The algorithm then checks if the induced subgraph of S*

_{j}*∪ S contains a copy of H.*

*Let event B mean S**j* *∪ S contains a copy of H for all j. Given S ⊆ V**k*, if more
*than θh vertices are selected from V*_{k}*in step 8, then event B will not occur (note*
*that θ ≥ 1). Thus the probability of event B is at most the probability that the*
*algorithm selects fewer than θh vertices from V**k* *in step 8. Let y be the number*
*of vertices of these p vertices selected in step 8 that belong in V** _{k}* (with multiplic-

*ity counted). Then Pr[ event B ] ≤ Pr [ y < θh ]. We estimate the upper bound*of the above probability by the Chernoff bound. As the probability of selecting a

*vertex in V*

_{k}*is k/N = λ and the total number of selections is p = 6θh/λ, we have*

*µ = E[ y ] = (6θh/λ)λ = 6θh. Rewrite Pr[ event B ] = Pr [ y < (1 − ∆)6θh ], where*

*∆ = 5/6. By the Chernoff bound, Pr[ event B ] ≤ e*^{−µ∆}^{2}^{/2}*= e*^{−6θh(5/6)}^{2}^{/2}*= e** ^{−25θh/12}*.

*Since θh > 1, Pr[ event B ] < e*

^{−2}*< 1/6. Hence the probability that we select an h-set*

*from V*

_{k}*in step 6 that leads to acceptance in step 12 is at least (1−1/6)(1−1/6) > 2/3.*

*The probability that a digraph G which has property P** _{k,H}* will be rejected is thus
less than 1/3. See Fig. 5.2 for illustration.

*On the other hand, suppose the input graph G = (V, E) is ²-far from any digraph*
*which has property P** _{k,H}*. Obviously, the probability that the algorithm accepts is

*equal to the probability that we find an h-set S whose induced subgraph does not*

*contain an H, and after we randomly select p additional vertices (with replacements),*

*there exists a (θh)-multiset S*

_{j}*from those p selected vertices such that the induced*

*subgraph of S*

_{j}*∪ S contains no copies of H. By Claim 15, the probability of finding*

*an h-set that contains copies of H is at least f (²; H)h!/*h¡(

*2)*

^{h}*m*

¢*λ*
i

*. For each (θh)-*
*multiset S*_{j}*, at least θ disjoint h-sets are checked; hence the probability that S*_{j}*∪ S*

*contains copies of H is at least*
Ã

*f (²;H)h!*

((^{h}^{2})

*m*)^{λ}

!_{θ}

*= (λ/6)*^{θ}*· (2/3)*^{1/κ}*. We then test* ¡_{p}

*θh*

¢
*(θh)-multisets in step 12. Since*

µ*p*
*θh*

¶

= *(6θh/λ)!*

*(θh)!* = *(6θh/λ)[(6θh/λ) − 1] · · · [(6θh/λ) − θh]*

*(θh)!* *> (6/λ)*^{θh}*> (6/λ)*^{θ}*,*
*the probability that the induced subgraph of S**j**∪ S contains copies of H for all j is*
*at least (6/λ)*^{θ}*· (λ/6)*^{θ}*· (2/3)*^{1/κ}*= (2/3)*^{1/κ}*. So, for each h-set S that passes the test*
*in step 7, the probability that S does not lead to acceptance in step 12 is at least*
*(2/3)*^{1/κ}*. Hence, regardless whether S passes the test in step 7, the probability that*
*none of the S leads to acceptance in step 12 is at least*

h

*(2/3)** ^{1/κ}*
i

_{κ}*= 2/3. Therefore,*
the probability that the algorithm accepts the input is less than 1/3.

*The query complexity of step 7 is O (h*^{2}) and the query complexity from step 9 to
*step 10 is O*¡¡_{p}

*θh*

¢¡_{θ}

2

¢¢. Since ¡_{p}

*θh*

¢¡_{θh}

2

¢ *> h*^{2}*, the query complexity is O*¡
*κ*¡_{p}

*θh*

¢¡_{θh}

2

¢¢*.*
*This value is independent of N. Hence the theorem follows.* *Q.E.D.*

*The value of f (²; H) decreases extremely fast with ², and is independent of*
*n [7]. Although it is difficult to compute the exact value of f (²; H) in general,*
*we can estimate a lower bound of f (²; H) by Szemer´edi’s regularity lemma, and*
*[(1 − ²)/(2 + h)]*^{h}*is one such lower bound. In our algorithm in Fig. 5.1, f (²; H) is*
just a coefficient. The soundness of our algorithm in Fig. 5.1 is proved in Theorem
*16. We can replace f (²; H) by [(1 − ²)/(2 + h)]** ^{h}* in step 4 of our algorithm in Fig. 5.1
without changing the validity of Theorem 16. The consequence is that our algorithm
needs to query more edges in the input digraph, but the total number of queried
edges remains independent of the input size.

· · · The input digraph

Randomly select anh-set

If theh-set is H-free

Test if the induced subgraph of the union between the (θh)-set and the h-set isH-free

h-set Randomly select 6/λ (θh)-sets

*Figure 5.2: Testing the property P** _{k,H}*.

*Let k >* *√*

*²N, ² be a constant and H*1 *= (V*1*, E*1*) be a digraph where V*1 =
*{v*_{1}*, v*_{2}*, . . . , v*_{d+1}*} and E*_{1} *= {(v*_{1}*, v*_{d+1}*), (v*_{2}*, v*_{d+1}*), . . . , (v*_{d}*, v*_{d+1}*)}. It is commonly*
called a star graph. We can use the algorithm in Fig. 5.1 to test whether the
*input digraph contains an H*1*-free k-induced subgraph. Obviously if a digraph is*
accepted by our algorithm, then the maximum indegree of this digraph is bounded
*by d with high probability. Similarly, let H*_{2} *= (V, E*_{2}*) be a digraph where E*_{2} =
*{(v**d+1**, v*1*), (v**d+1**, v*2*), . . . , (v**d+1**, v**d**)}. We can use the algorithm in Fig. 5.1 to test*
*whether the maximum outdegree of the input digraph is bounded by d with high*
*probability. If an input digraph is accepted by our algorithm for both H*_{1} *and H*_{2},
then we know that this digraph satisfies the restrictions of the BR tester. In this
case, we use the BR tester to test strong connectivity of the input digraph. The total
query complexity of testing strong connectivity is the sum of the query complexities
of the algorithm in Fig. 5.1 and the BR tester. Since the query complexities of the
algorithm in Fig. 5.1 and the BR tester are both independent of the input size, the
sum of the query complexities of both algorithms remains independent of the input
size. The query complexity of our strong connectivity tester in Fig. 4.2 is the square
root of the input size. Hence, the sum of the query complexities of the algorithm in
Fig. 5.1 and the BR tester is less than the query complexity of our strong connec-
tivity tester in Fig. 4.2. Since the main efficiency parameter of a method to solve
a property testing problem is its query complexity, our strong connectivity tester is
not the most efficient one for all digraphs. It is better to use the algorithm in Fig. 5.1
to determine which tester (the BR tester or our strong connectivity tester) should

be used to test the strong connectivity of the digraph.