A competitive algorithm in searching for many edges in a hypergraph

(1)

www.elsevier.com/locate/dam

Note

A competitive algorithm in searching for many edges in a

hypergraph

Ting Chen

a,∗

, Frank K. Hwang

b

a_{College of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou, PR China} b_{Department of Applied Mathematics, National Chiaotung University, Hsinchu, Taiwan, ROC}1 Received 3 October 2004; received in revised form 23 March 2006; accepted 11 July 2006

Available online 10 October 2006

Abstract

We give a competitive algorithm to identify all d defective edges in a hypergraph with d unknown. Damaschke did the d = 1 case for 2-graphs, Triesch extended the d = 1 case to r-graphs, and Johann did the general d case for 2-graphs. So ours is the ﬁrst attempt to solve the searching for defective edges problem in its full generality. Further, all the above three papers assumed d known. We give a competitive algorithm where d is unknown.

Keywords: Competitive algorithm; Hypergraph; Edge-searching problem; Group testing; Graph searching

1. Introduction

A hypergraph is of rank r if the maximum number of vertices in an edge is r. In particular, if every edge has r vertices, then it is known as an r-graph. We will represent a hypergraph G by its edge-set E, while V (E) denotes the set of vertices in E. We also let G(S) denote the subgraph of G induced by the set S of vertices.

In our problem, we want to identify a subset D ⊆ E of defective edges with a minimum number of edge tests, where an edge test takes an arbitrary subset S of V (E) and asks whether the subgraph G(S) contains a defective edge. Note

that the edge test is an extension of the group test which has a much longer history[4]. In a group-testing problem, a

vertex is either defective or good. A group test takes an arbitrary subset S of V and asks whether S contain a defective vertex. The group-testing problem can be interpreted as an edge-testing problem on a 1-graph where a 1-edge is just a vertex.

For a brief history of edge testing, Chang and Hwang[2]first cast a special group-testing problem as searching for a single edge in a complete bipartite graph. Since a group test is convertible to an edge test in the single-edge case, this problem can be interpreted as the first formulation of an edge-testing problem. However, Aigner[1]was the first who consciously proposed the edge-testing problem for general 2-graphs, thus bringing the “graph” into focus. Let

∗_{Corresponding author.}

E-mail addresses:steafting@sina.com(T. Chen),fhwang@math.nctu.edu.tw(F.K. Hwang).

1_{This research was done when the second author was visiting Center of Mathematical Sciences, Zhejiang University, Hangzhou, Zhejiang,} PR China. The ﬁrst author was supported by National Natural Science Foundation of China (10271110).

(2)

M(E, d) denote the minimum number of edge tests guaranteeing to ﬁnd the d defective edges in E. He conjectured for

2-graphs

M(E, 1)log2|E| + c where c is a constant.

Damaschke[3]proved the conjecture with c = 1. Triesch[8]generalized to hypergraphs (of rank r) with c = r − 1.

Du and Hwang[4]conjectured for 2-graphs

M(E, d)d log2 |E| d + c where c is a constant.

Recently, Johann[6]made a breakthrough into the d > 1 problem by proving for 2-graphs

M(E, d)d log2 |E| d + 7 .

All above results assumed that d is known. For the case that d is unknown, Hwang[5]gave a competitive algorithm

requiring d(log2|E| + 4) tests.

In this paper, we study the problem of searching for all defective edges in an r-graph and give a competitive algorithm (guaranteeing its performance against all d) requiring

dlog2|E| + (r − 1)r2_dr + o(dr₎

tests, where r is assumed to be a small constant. Note that the information-theoretic lower bound with d known is

log2

|E|

d

∼ d(log2|E| − log2d).

Then we extend it to general hypergraphs.

To motivate our problem, we cite a biological application[7]. Many biological phenomena occur due to the joint

presence of several molecules. Suppose there exists a set D where each member is a set of molecules whose joint presence would cause a particular biological phenomenon. Then we can construct a hypergraph by taking the set of relevant molecules as vertices, all subsets of vertices which are candidates of members of D as edges, and the unknown set D as the set of defective edges to be identiﬁed. There exist biological experiments which can recognize the existence of a set of molecules causing the particular phenomenon.

2. The general approach for r-graphs

We follow Johann’s approach in general, but also have to resolve some problems unique to r-graphs for r > 2 and to competitive algorithms. Three important issues are the following:

(i) After a positive test (a test with a positive outcome), how do we identify a defective edge efﬁciently?

(ii) A defective edge can be removed only by removing some vertices in it, which is undesirable since there may be other defective edges incident to these vertices. But if we do not remove defective edges, how do we avoid repetitively identifying the same defective edge?

(iii) How do we analyze the number of tests? We deal with these issues one by one.

(i) Johann commented that for r = 2, Triesch’s algorithm to ﬁnd the single defective edge in E by log2|E| + 1 tests

works with a little modiﬁcation even if E contains many defective edges. In a private communication, Johann actually revealed that the modiﬁcation works for general hypergraphs with rank r.

In the following, we describe Johann’s edge-testing version of Triesch’s algorithm. Triesch used group testing to

describe the testing scheme which is known to be convertible to edge testing if|D| = 1. To be more speciﬁc, let

C = (v1, . . . , vc) be a vertex cover on E. Then each edge contains a vertex of C. We say an edge e is led by vi if e

contains no vj ∈ C with j < i. Triesch’s group tests are always on {v1, . . . , vj} for some j, which can be converted to

(3)

identiﬁed. Then the problem is reduced to ﬁnding a defective (r − 1)-edge which can be done by induction. We will refer to Johann’s edge-testing version as the TJ-procedure. We now formally state the TJ-procedure. Let V denote the vertex set of E.

Step 1: Use the greedy algorithm (as speciﬁed by Triesch) to construct a vertex cover C = (v1, . . . , vc) of E with d(v1)d(v2) · · · d(vc) where d(vi) is the degree of vertex i in the graph obtained from E by deleting v1, . . . , vi−1

and their edges. Let T be the rooted binary tree with leaves v1, . . . , vcfrom left to right in that order, and the path of vi

has lengthlog₂_d(v|E|

i), 1i c (the existence of T is guaranteed by Kraft’s inequality). Let TLand TRdenote the left

and right subtree of T, and VLand VRtheir leave-sets, respectively.

Step 2: Test V \VL. If positive, set T = TRand V = V \VL; if negative, set T = TL. Step 3: If T has more than one leave, go back to Step 2.

Step 4: Output the only leave in T as the leader in a defective edge.

Note that Triesch’s procedure always identiﬁes the defective edge with the minimum leader while Johann’s edge-testing version identiﬁes the one with the maximum leader. When d = 1, the minimum leader is the maximum leader, so both procedures are reduced to the same (r − 1)-graph. But for d > 1, they are reduced to different

(r − 1)-graphs. So not only the number of tests in ﬁnding the minimum and maximum leader can be different,

the subsequent reductions can also lead to different number of tests. However, these differences disappear in our

worst-case analysis. By the above explanation and since Triesch’s algorithm requires at most log2|E| + r tests,

we have:

Lemma 1 (TJ-procedure). Let E be a hypergraph of rank r, i.e., |e|r for all edges e ∈ E. There exists an algorithm

which ﬁnds one of several defective edges in E with at mostlog₂|E| + r edge-tests.

(ii) The working of the TJ-procedure is based on the premise that E does not contain a defective edge e already identified. Because if it did, then the procedure might identify e repeatedly. To prevent this from happening, as soon as a defective edge is identified, we must partition its vertices into different subsets such that a future test on vertices of a given subset will not encounter an identified defective edge. This is the purpose of the partition stage in Johann’s procedure.

However, we still need to identify defective edges whose vertices spread into different subsets. But testing vertices from different subsets may bring back identiﬁed defective edges. For r = 2, each defective edge involves only two vertices. This problem was cleverly handled by Johann by mixing vertices from two subsets only when one of them

contains a single vertex. Say, the two subsets are Vi and Vj, and v is the single vertex from Vj. Then we can avoid

any identiﬁed defective edges (v, u), u ∈ Viby deleting (temporarily) u from Vi. For r > 2, the problem is much more

complicated.

Instead of taking one vertex from outside (of Vi) to test with vertices in Vi, we have to take a set K of k vertices

from outside to test with vertices in Visince such a combination of vertices may contain a defective edge. We do this

by ﬁxing a K and then searching for all defective edges containing this K. This is achieved by requiring every test to contain K. Let CH(r) denote our proposed algorithm for r-graphs. Suppose we test K ∪ S, where S is a subset of the

vertex set of V \K. Then we can call the subroutine CH(r − k) to search for all induced defective edges e(with rank

r − k) such that e∪ K is a defective edge.

We have to avoid identifying a defective edge contained in K ∪S which is already identified before the induction stage CH(r − k) is reached. This means that all such identified defective edges must be broken into at least two subsets which are not to be mixed in tests. On the other hand, these subsets may contain an induced defective edge not identified yet,

thus we need to mix them in tests. This dilemma is solved by recursively partitioning Viin a sequence of subroutines

until the induced defective edge is of rank 1, namely, r − 1 vertices have been speciﬁed to attach to each test in the

subroutine. Then any vertex in Vi which together with the r − 1 speciﬁed vertices constitute an identiﬁed defective

edge can be deleted from Vi. Since the breaking up of an identiﬁed defective edge for general r has more implications

than the r = 2 case, we have to replace the more efﬁcient but complicated multi-subset scheme in[6]by a 2-subset

scheme.

For r< r, since CH(r) may be called as a subroutine, we need to solve CH(r) in a more general context. Suppose

CH(r) has vertex-set V. Then a set K not in Vis imposed so that all defective edges in CH(r) are induced by K, i.e.,

the set of induced defective edges in CH(r) is {X\K: where X is a defective edge containing K}. If CH(r) is not used

(4)

(iii) We assume that r is small and can be treated as a constant. Thus the number of tests will be represented as a

function of|V |, |E| and d. As in Johann’s algorithm, we will bound the number of tests in identifying each defective

edge, including negative tests (tests with a negative outcome) occurred during the process. Since a positive test always initiates the TJ-procedure and is counted as a test of the procedure, all positive tests are counted as tests in identifying defective edges. Further, some negative tests also occur in that procedure and are counted. So the analysis is reduced to counting negative tests occurred elsewhere. Unlike Johann’s algorithm, we do not attempt to optimize the size of negative tests (we cannot since we do not know d), and we do not associate each negative test with the identiﬁcation of a defective edge. The consequence is a very simple algorithm and analysis. The price we pay is that each defective

edge consumeslog2|E| tests instead of log2|E|_d tests as Johann obtained (we suspect that a competitive algorithm

cannot achieve the latter).

3. An algorithm for r-graphs

Let E0(K) = {e|e ∪ K ⊆ E, e ⊆ V0}. Let Ki denote the subset imposed on CH(i), i.e., Ki is a part of every test

in CH(i). If the original problem is deﬁned on an r-graph, then |Ki| = r − i. Finally, let I denote the set of currently

identiﬁed defective edges (in the original problem). For a set W of vertices, let I (W ) denote the subset of I formed by those edges whose vertices all belong to W .

We give an algorithm CH(r) recursively. We ﬁrst deﬁne CH(1). Algorithm CH(1)

Input: E, K1, V0, V1, I (If CH(1) is not a subroutine, then V0= V (E), K1= V1= I = ∅). Attach K1to every test.

Step 1: Test V0. If positive, use the halving procedure, which we will treat as a special case of the TJ-procedure, to

identify a defective vertex u (e = K1∪ {u}). Set I = I ∪ {e}, V0= V0\{u} and go back to Step 1.

Step 2: For every vertex v ∈ V1, test K1∪ {v}. If positive, set I = I ∪ {K1∪ {v}}. Step 3: Stop.

Algorithm CH(r) (r r2)

The partition stage:

Input: E, Kr, V0, V1, I (If CH(r) is not a subroutine, i.e., r= r, then V0= V (E) and V1= Kr= I = ∅). Attach Kr

to every test.

Step 1: Test V0. If positive, use the TJ-procedure to identify a defective edge e = {v1, v2, . . . , vr} ⊆ V0. Add the vertex

v1to V1. Set V0= V0\{v1} and I = I ∪ {{e} ∪ K_r}. If |V0|r, go back to Step 1.

Step 2: Stop.

Suppose V0, V1are nonempty. Go to the search stage.

The search stage:

Step 1: Set k = 1.

Step 2: Let K be a k-subset of V1. Set K_r_−k = K_r ∪ K. Construct a vertex cover C (C ∩ K_r_−k= ∅) on I (K_r_−k∪ V (E0(Kr−k))). Call subroutine CH(r− k) with E = E0(Kr−k), V = V (E0(Kr−k)), V0= V (E0(Kr−k))\C

and V1= C. If for some v ∈ V0, K_r_−k∪ {v} ∈ I (K_r_−k∪ V (E0(K_r_−k))) (possible only for k = r− 1), delete

v from V0.

Do this for all k-subsets K. Set k = k + 1. If k < r, go back to Step 2.

Step 3: Test all r-subsets S (except those such that S ∪ Kr ∈ I) of V1. If positive, set I = I ∪ {S ∪ K_r}.

We will refer to tests in Step 3 as direct tests.

Step 4: Stop.

Theorem 1. Let E be an arbitrary r-graph which contains d defective edges, where d is not necessarily known. Then

the algorithm CH(r) identiﬁes all defective edges of E with at most dlog2|E| + (r − 1)2r_dr + o(dr_{) tests.}

Proof. Clearly, all edges identiﬁed as positive by the algorithm are through either the TJ-procedure or direct tests, both

are error-free. Thus it sufﬁces to prove that a defective edge is always identiﬁed.

Suppose a defective edge with vertex set X is not identiﬁed at the partition stage of CH(r). Then a nonempty subset

(5)

selection is K = X. Suppose|X| = k. Then the problem is reduced to the subroutine CH(r − k) with K imposed. By induction on r, the induced defective edge X\K can be identiﬁed in the subroutine, which implies X\K ∪ K = X is a defective edge.

It remains to count the number of tests CH(r) uses. By Lemma 1, the TJ-procedure uses at most log2|E| + r

tests. Since a defective edge is identiﬁed by either the TJ-procedure or a direct test, the number of tests consumed

in identifying one defective edge is bounded bylog₂|E| + r. This bounded number of tests includes the possible

positive test initiating the identiﬁcation process, and all negative tests occurred during the process of identifying the

defective edge. Thus, the number of tests identifying d defective edges is at most d(log2|E| + r). Further it sufﬁces

to count the number N(r) of negative tests occurred elsewhere in CH(r). There are three sources for tests in N(r): one negative test from the partition stage, those from subroutines and the direct tests.

Denote DKas the set of all defective edges in K ∪ V (E0(K)). Let dK= |DK|. Note that for K = K, DK and DK

may overlap in defective edges containing some vertices in K ∩ Kand other vertices in V (E0(K)). Hence we can

only bound dK by d. However, for|K| = 1, DKand DK are disjoint; hence

K:|K|=1dKis bounded by d. We count

the number N(r) of negative tests in CH(r) by induction on r.

For r = 1, N(r) is easily veriﬁed to be at most 1 + dK since the negative tests counted in N(r) consists of 1 at the

end of Step 1 and occur in Step 2; but|V1|d_K. So Theorem 1 holds for r = 1.

Note that a vertex is in V1either because it is in the vertex cover C (which covers the set of identiﬁed defective

edges), or it is a vertex in a defective edge identiﬁed from the updated V0at the partition stage. Thus every vertex in

V1corresponds to a distinct defective edge; hence|V1|d.

We prove the general r 2 case by induction.

N(r)1 + r−2 k=1 K⊆V1:|K|=k N(r − k) + K⊆V1:|K|=r−1 N(1) + |V1| r r−2 k=1 K⊆V1:|K|=k ((r − k − 1)r−k2 _dr−k K + o(dKr−k)) + K⊆V1:|K|=r−1 (1 + dK) + dr K⊆V1:|K|=1 (r − 2)r−12 _dr−1 K + r−2 k=2 K⊆V1:|K|=k (r − k − 1)r−k2 _dr−k K + dr−1(1 + d) + dr + o(dr₎ (r − 2)r−12 ⎛ ⎝ K⊆V1:|K|=1 dK ⎞ ⎠ r−1 + (r − 3)r2−1 r−2 k=2 d k dr−k+ 2dr + o(dr) (r − 3)r 2−1_{(r − 3)d}r+ 2dr + o(dr₎ = ((r − 3)r2+ 2)dr + o(dr₎ (r − 1)r2_dr + o(dr_).

Thus, N(r)(r − 1)2r_dr + o(dr_{) holds for general r.}

Let T (r) denote the total number of tests required by CH(r). Since T (r)d(log2|E| + r) + N(r) and N(r)

(r − 1)2r_dr + o(dr_{), we have that T (r)dlog2}|E| + (r − 1)r2_dr+ o(dr_{) holds for r 2. Therefore, algorithm}

CH(r) needs at most dlog2|E| + (r − 1)

r

2_dr + o(dr_{) tests to identify all d defective edges in E.}

Unfortunately, we are not able to provide more speciﬁc functions than the o() function we used in derivation.

4. An algorithm for hypergraphs

Let E be a hypergraph of rank r, i.e.,|e|r for all edges e ∈ E. It is assumed that no defective edge is contained in

another (this is an assumption made in all models dealing with the quoted biological application). To identify the set

D ⊆ E in a hypergraph, we follow the general approach in algorithm CH(r) for r-graphs with a slight modiﬁcation.

(6)

The search stage in CH∗(r) will be a little different from CH(r). When we choose a k-subset K of V1 before

constructing a vertex cover and then calling CH∗(r − k), we should test K itself. If the outcome is positive, then K ∈ D.

By our assumption, there is no other defective edge containing K, so we do not need to call CH∗(r − k) further. If the

outcome is negative, call CH∗(r − k) to identify all induced defective edges in E0(K).

Algorithm CH∗(1) is same as CH(1). Now, we give the algorithm CH∗(r) recursively.

Algorithm CH∗(r) (r r2)

Input: E, Kr, V0, V1, I (If CH∗(r) is not a subroutine, then V0= V (E) and V1= K_r = I = ∅).

The partition stage:

Step 1: Test V0. If positive, use the TJ-procedure to identify a defective edge e = {v1, v2, . . . , vs} ⊆ V0(s r). Add

the vertex v1to V1. Set V0= V0\{v1} and I = I ∪ {{e} ∪ Kr}. If V0= ∅, go back to Step 1.

Step 2: Stop.

Suppose V0, V1are nonempty. Go to the search stage.

The search stage:

Step 1: Set k = 1.

Step 2: Choose a k-subset K of V1, where G(K_r∪ K) does not contain any identiﬁed defective edge in I. Set K_r_−k= Kr ∪ K.

Test Kr−k. If positive, let I = I ∪ {Kr−k}.

Else construct a vertex cover C (C ∩ Kr−k= ∅) on I (Kr−k∪ V (E0(Kr−k))). Call subroutine CH∗(r− k)

with E = E0(Kr−k), V = V (E0(Kr−k)), V0=V (E0(Kr−k))\C and V1=C. If for some v ∈ V0, K_r_−k∪{v} ∈

I (Kr−k∪V (E0(Kr−k))) (possible only for k=r−1), delete v from V0. Attach K_r_−kto any test in CH∗(r−k).

Do this for all k-subsets K. Set k = k + 1. If k < r, go back to Step 2.

Step 3: Test all r-subsets S (except those such that S ∪ Kr ∈ I) of V1. If positive, set I = I ∪ {S ∪ K_r}.

We will also refer to tests in Step 3 as direct tests.

Step 4: Stop.

Theorem 2. Let E be a hypergraph of rank r with d defective edges, where d is not necessarily known. Then the

algorithm CH∗(r) identiﬁes all defective edges in E with at most dlog2|E| + (r − 1)r2_dr+ o(dr_{) tests.}

Proof. Similar to the proof of Theorem 1, we can show that CH∗(r) identiﬁes all defective edges of the hypergraph E.

To count the number of tests CH∗(r) uses, let N∗(r) and T∗(r) be the counterparts of N(r) and T (r) in CH∗(r).

The analysis of the test number of CH∗(r) is also similar to that of CH(r). The only difference is that the subroutine

of CH∗(r) should need N∗(r − k) + 1 tests instead of N(r − k) tests in CH(r). But it does not change the result; so

N∗(r)(r − 1)2r_dr+ o(dr_{). Consequently, T}∗_(r)dlog2|E| + (r − 1)r2_dr+ o(dr_{) holds for r 2. Therefore,}

algorithm CH∗(r) needs at most dlog2|E| + (r − 1)

r

2_dr+ o(dr_{) tests to identify all d defective edges of E.}

Acknowledgment

We thank Dr. P. Johann for a careful reading of the paper and many helpful suggestions. We also thank the reviewers for their efforts to improve the readability of this paper.

References

[1]M. Aigner, Combinatorial Search, Wiley-Teubner Series in Computer Science, Wiley, New York, 1988.

[2]G.J. Chang, F.K. Hwang, A group testing problem on two disjoint sets, SIAM J. Algebraic Discrete Methods 2 (1981) 35–38. [3]P. Damaschke, A tight upper bound for group testing in graphs, Discrete Appl. Math. 48 (1994) 101–109.

[4]D.Z. Du, F.K. Hwang, Combinatorial Group Testing and its Applications, World Scientiﬁc, Singapore, 1993. [5]F.K. Hwang, A competitive algorithm to ﬁnd all defective edges in a graph, Discrete Appl. Math. 148 (2005) 273–277. [6]P. Johann, A group testing problem for graphs with several defective edges, Discrete Appl. Math. 117 (2002) 99–108. [7]D.C. Torney, Set pooling designs, Ann. Combin. 3 (1999) 95–101.