The minimum number of e-vertex-covers among hypergraphs with e edges of given ranks

(1)

Contents lists available atScienceDirect

Discrete Applied Mathematics

journal homepage:www.elsevier.com/locate/dam

The minimum number of e-vertex-covers among hypergraphs with e

edges of given ranks

F.H. Chang

a,∗

, H.L. Fu

b

, F.K. Hwang

b

, B.C. Lin

c

a_{Department of Applied Mathematics, National Chiayi University, Chiayi City, 60004, Taiwan} b_{Department of Applied Mathematics, National Chiao Tung University, Hsinchu, 30050, Taiwan} c_{Department of Math, National Central University, Chung-Li, 32054, Taiwan}

a r t i c l e i n f o Article history:

Received 25 May 2004

Received in revised form 28 August 2007 Accepted 21 May 2008

Available online 30 June 2008

Keywords: Vertex-cover Hypergraph Pooling design

a b s t r a c t

We study the problem that among all hypergraphs with e edges of ranks l1, . . . ,leandv

vertices, which hypergraph has the least number of vertex-covers of size e. The problem is very difficult and we only get some partial answers. We show an application of our results to improve the error-tolerance of a pooling design proposed in the literature.

1. Introduction

Let H(V, E) denote a hypergraph with vertex-set V and edge set E

= {

E1

, . . . ,

Ee

}

. Let lidenote the rank of Ei, i.e. li

= |

Ei

|

.

Then li

=

1 is allowed. A subset V0

⊆

V is called a vertex-cover of H if it intersects every Ei

∈

E. It is called a d-vertex-cover if

V0

=

d. Let f_d

(

H

)

denote the number of d-vertex-covers in H.

Define

H

(v;

l1

, . . . ,

le

) = {

H

(

V

,

E

) : |

V

| =

v, |

E

| =

e

, |

Ei

| =

li

,

i

=

1

, . . . ,

e

}

and

fd

(v;

l1

, . . . ,

le

) =

min

{

fd

(

H

) :

H

∈

H

(v;

l1

, . . . ,

le

)}.

We study fd

(v;

l1

, . . . ,

le

)

in this paper.

This problem is motivated by the construction of error-detecting and error-correcting pooling designs (used in clone library screening). Ngo and Du [4] observed that the number of e-vertex-covers has bearing on the error-correcting ability of a pooling design.

2. The hypergraph case

A vertex is called an isolated vertex if it is not in any edge. A hypergraph H(V, E) is called optimal in H

(v;

l1

, . . . ,

le

)

if it

achieves fd

(v;

l1

, . . . ,

le

)

.

Lemma 2.1. For any given H

(v;

l1

, . . . ,

le

)

, there exists an H∗such that H∗does not have both an isolated vertex and a vertex

shared by two edges.

∗_{Corresponding author. Tel.: +886 5 2717917.}

E-mail addresses:fei@mail.ncyu.edu.tw(F.H. Chang),hlfu@math.nctu.edu.tw(H.L. Fu),fkhwang@gmail.com(F.K. Hwang),beychi.lin@gmail.com (B.C. Lin).

(2)

Proof. Let H be a hypergraph in H

(v;

l1

, . . . ,

le

)

with an isolated vertex i and a vertex j shared by two edges a and b. Let H0

be obtained from H by eliminating i and splitting j into two vertices jaand jbsuch that ja

∈

a and jb

∈

b, i.e., a and b no longer

share the vertex j. We show fe

(

H0

) ≤

fe(H) by mapping all e-vertex-covers C0of H0to distinct e-vertex-covers C of H.

We consider several cases: (i) C0_{contains neither j}

anor jb. Set C0

=

C.

(ii) C0_{contains both j}

aand jb. C is obtained by replacing jawith j and jbwith i.

(iii) C0_{contains j}

abut not jb. C is obtained by replacing jawith j.

(iv) C0_{contains j}

bbut not ja, and no other vertex of b. C is obtained by replacing jbwith j.

(v) C0_{contains j}

band another vertex of b, but no ja. C is obtained by replacing jbwith i.

Clearly, C is an e-vertex-cover of H. To check that the mapping is injective, it is obvious that the only time two distinct

e-vertex-covers C0and C00of H0can map to the same C is when C0and C differ only in one vertex, i.e., C0

=

S

∪ {

ja

}

, when S is

a set of e

−

1 vertices containing neither janor jb. Consider C00, S must contain a vertex of a other than ja. Hence S contains

both a vertex of a and a vertex of b. From our mapping rule, C0_{will be mapped to S}

_{∪ {}

_j

_}

_{, while C}00_{to S}

_{∪ {}

_i

_}

_{, two distinct} e-vertex-covers in H.

By repeating this procedure, and since

|

H

(v;

l1

, . . . ,

le

)|

is finite, eventually we reach a hypergraph which has either no

isolated vertices or no vertices shared by two edges. Since this argument holds for all H

(v;

l1

, . . . ,

le

)

,Lemma 2.1follows.

Define l

=

P

e

i=1liand L

=

Q

e

i=1li. For

v ≥

l, fromLemma 2.1, fe

(v;

l1

, . . . ,

le

)

is nondecreasing in

v

. We next show that

it reaches maximum at a certain

v

.

Theorem 2.2. fe

(v;

l1

, . . . ,

le

) =

L for

v ≥

l.

Proof. Since any H

∈

H

(v;

l1

, . . . ,

le

)

with two intersecting edges must have an isolated vertex, byLemma 2.1, there exists

an optimal hypergraph H∗_{with no intersecting edges, i.e., H}∗_{contains e disjoint edges of ranks l}

1

, . . . ,

leand

v −

l isolated

vertices. Clearly, H∗has L e-vertex-covers. Theorem 2.3. fe

(

l

−

1

;

l1

, . . . ,

le

) = Q

i6=1,2li

h

l1l2

−

1

+

1₂

P

i6=1,2

(

li

−

1

)

i

.

Proof. Without loss of generality, assume l1

≥

l2

≥ · · · ≥

le. ByLemma 2.1, it suffices to consider H with no isolated

vertices.

Since

v =

l

−

1, H contains exactly two edges intersecting in one vertex. Let Hmndenote the hypergraph where Emand

Enintersect, and fe

(

Hmn

)

its number of e-vertex-covers. Then

fe

(

Hmn

) =

[

(

lm

−

1

) (

ln

−

1

) +

lm

+

ln

−

2]

Y

i6=m,n li

+

X

k6=m,n

lk 2

Y

i6=m,n,k li

=

(

lmln

−

1

)

Y

i6=m,n li

+

1 2

X

k6=m,n lk

(

lk

−

1

)

Y

i6=m,n,k li

=

(

lmln

−

1

)

Y

i6=m,n li

+

1 2

X

i6=m,n

(

li

−

1

)

Y

i6=m,n li

=

Y

i6=m,n li

"

lmln

−

1

+

1 2

X

i6=m,n

(

li

−

1

)

#

.

If l1

=

l2

= · · · =

le, thenTheorem 2.3obviously holds; if not, we prove:

Suppose ly

>

lz. Then fe

(

Hxy

) <

fe

(

Hxz

)

for all other x

∈ {

1

, . . . ,

e

}

.

fe

(

Hxy

) =

Y

i6=x,y li

"

lxly

−

1

+

1 2

X

i6=x,y

(

li

−

1

)

#

=

Y

i6=x,y,z li

"

lxlylz

−

lz

+

1 2lz

(

lz

−

1

) +

1 2lz

X

i6=x,y,z

(

li

−

1

)

#

=

Y

i6=x,y,z li

"

lxlylz

+

1 2lz

(

lz

−

3

) +

1 2lz

X

i6=x,y,z

(

li

−

1

)

#

<

Y

i6=x,y,z li

"

lxlylz

+

1 2ly ly

−

3

+

1 2ly

X

i6=x,y,z

(

li

−

1

)

#

=

fe

(

Hxz

).

(3)

Fig. 1. Graphs withv =l−2 vertices and e edges.

For

v =

l

−

2, we have to restrict our attention to the case l1

=

l2

= · · · =

le

=

k. Assume e

≥

4, then there are four

hypergraphs as shown inFig. 1.

We count the number of e-vertex-covers for each of them.

Lemma 2.4. For

v =

l

−

2

,

l1

=

l2

= · · · =

le

=

k, the number of e-vertex-covers for the graphs inFig. 1is listed respectively

as follows: (i) fe

((

a

)

k

) = (

k

−

1

)

3ke−3

+

3k

−

3 2

ke−3

+

3k

−

3 1

e

−

3 1

k 2

ke−4

+

3k

−

3 0

"

e

−

3 1

k 3

ke−4

+

e

−

3 2

k 2

2 ke−5

#)

.

(ii) fe

((

b

)

k

) = (

k

−

1

)

2

(

k

−

2

)

ke−3

+

2

3k

−

4 2

−

2k

−

3 2

ke−3

+

k

−

1 1

e

−

3 1

k 2

ke−4

+

3k

−

4 1

ke−3

+

3k

−

4 0

e

−

3 1

k 2

ke−4

.

(iii) fe

((

c

)

k

) = (

k

−

1

)

4ke−4

+

2

("

2

k

−

1 2

k

−

1 1

+

k

−

1 1

2

2k

−

2 1

#

ke−4

+

k

−

1 1

2

e

−

4 1

k 2

ke−5

)

+

4k

−

4 2

ke−4

+

4k

−

4 1

e

−

4 1

k 2

ke−5

+

4k

−

4 0

"

e

−

4 1

k 3

ke−5

+

e

−

4 2

k 2

2 ke−6

#)

.

(iv) fe

((

d

)

k

) = (

k

−

2

)

2ke−2

+

2

2k

−

4 1

ke−2

+

2k

−

4 0

e

−

2 1

k 2

ke−3

+

ke−2

.

Proof. We start with the proof of (i) forFig. 1(a). Clearly, if an e-vertex-cover does not contain the vertex x, then each of the three edges incident to x has k

−

1 choices as vertex covers and all the other edges have k choices. Thus, the number of

e-vertex-covers is equal to

(

k

−

1

)

3ke−3which gives the first term. On the other hand, if an e-vertex-cover does contain x, then it contains at most two other vertices in the x-tree since we have e

−

2 components. So, the second term can be obtained by considering the number of extra vertices in the x-tree (different from x) which are chosen for the e-vertex-cover. Now, it is not difficult to see that if the e-vertex-cover contains three vertices of the x-tree, then it contains exactly one vertex of the

(4)

other e

−

3 edges; if the e-vertex-cover contains two vertices of the x-tree, then one of the remaining e

−

3 edges contains exactly two vertices and each of the other e

−

4 edges contains exactly one vertex; and if the e-vertex-cover contains only

x in the x-tree, then we have choices of

(

3

,

1

,

1

, . . . ,

1

)

or

(

2

,

2

,

1

, . . . ,

1

)

for the other e

−

3 edges. Therefore, we have the second term.

Following the same line of reasoning, in each of (ii), (iii), (iv), the terms are broken down into taking none of x

,

y, one

of them, and both of them. It is worth of noting that in (ii)

3k−4 2

−

2k−3 2

represents the case that, assuming x is taken but not y, then two other vertices are taken from the

(

x

,

y

)

-tree, at least one of them from the edge not incident to x; and in (iii) the last three terms represent taking both x and y, and 2 or 1 or 0 other vertices from the x-tree and the y-tree. Theorem 2.5. fe

(

l

−

2

,

ke

) =

fe

((

b

)

k

)

for e

≥

4 and k

≥

2.

Proof. Let fe

(

l

−

2

,

ke

)

denote the case that all e edges have length k. By using MAPLE, we obtain

fe

((

a

)

k

) −

fe

((

b

)

k

) =

ke−3

3

(

k

−

1

)

2e2

−

(

5k

+

11

) (

k

−

1

)

e

+

12

(

k

+

1

) /

24

=

ke−3

{

[3

(

k

−

1

)

e

−

(

2k

+

8

)

] [

(

k

−

1

)

e

−

(

k

+

1

)

]

−

2

(

k

−

2

) (

k

+

1

)} /

24

.

fe

((

a

)

k

) −

fe

((

b

)

k

)

is clearly increasing in e. So it suffices to prove fe

((

a

)

k

) −

fe

((

b

)

k

) ≥

0 for e

>

4.

For e

=

4, fe

((

a

)

k

) −

fe

((

b

)

k

) =

k [10

(

k

−

2

) (

3k

−

5

) −

2

(

k

−

2

) (

k

+

1

)

]

/

24

=

k

(

k

−

2

)

[5

(

3k

−

5

) − (

k

+

1

)

]

/

12

=

k

(

k

−

2

) (

7k

−

13

) /

6

≥

0 for k

≥

2

.

fe

((

c

)

k

) −

fe

((

b

)

k

) =

ke−4

(

k

−

1

)

3

(

k

−

1

)

e2

−

(

11k

+

5

)

e

+

4

(

2k

+

5

) +

24

/

24

=

ke−4

{

(

k

−

1

)

[[3

(

k

−

1

)

e

−

4

(

2k

+

5

)

]

(

e

−

1

) +

12e]

+

24

}

/

24

=

ke−4

{

(

k

−

1

)

[[3

(

k

−

1

)

e

−

4

(

2k

+

5

) +

12]

(

e

−

1

) +

12]

+

24

}

/

24

=

ke−4

{

(

k

−

1

)

[[3

(

k

−

1

)

e

−

8

(

k

+

1

)

]

(

e

−

1

) +

12]

+

24

}

/

24

.

fe

((

c

)

k

) −

fe

((

b

)

k

)

is clearly increasing in e. So it suffices to prove fe

((

c

)

k

) −

fe

((

b

)

k

) ≥

0 for e

>

4.

For e

=

4, fe

((

c

)

k

) −

fe

((

b

)

k

) = {(

k

−

1

)

[[12

(

k

−

1

) −

8

(

k

+

1

)

]

·

3

+

12]

+

24

}

/

24

=

[

(

k

−

1

) (

12k

−

48

) +

24]

/

24

=

k2

−

5k

+

6

/

2

=

(

k

−

2

) (

k

−

3

) /

2

≥

0 for k

≥

2

.

fe

((

d

)

k

) −

fe

((

b

)

k

) =

ke−3

(

ke

−

e

−

k

−

1

) /

2

≥

0

.

While we have no result on general fe

(v;

l1

, . . . ,

le

)

, the following lemma helps us to obtain lower bounds.

Lemma 2.6. Suppose li

≥

l0ifor 1

≤

i

≤

e. Then fe

(v;

l1

, . . . ,

le

) ≥

fe

(v;

l01

, . . . ,

l 0

e

)

.

In particular, fe

(v;

l1

, . . . ,

le

) ≥

fe

(v,

ke

)

if li

≥

k for 1

≤

i

≤

e.

Lemma 2.7. Let C be a vertex-cover of G such that

|

C

| =

c

<

e and S be the set of vertices in G such that each vertex is incident to at least two edges of G. Let

|

Ei

\

S

| =

l0i, i

=

1

,

2

, . . . ,

e. Then fe

(v;

l1

, . . . ,

le

) ≥

v−_e−cc

+ Q

l

0

i

.

Proof. v−_e₋_cc

represents the number of e-covers which contains C and

Q

l0

iis the number of e-covers which are different

from the above e-covers (since c

<

e, C contains at least one vertex in S). Proposition 2.8. If e

(

k

−

1

) +

1

≤

v <

ek, then fe

(v,

ke

) ≥ (

k

−

2

)

ke−3.

Proof. Since

|

S

| ≤

k

−

1, e

X

i=1 l0_i

≥

k

(

e

−

2

) .

Hence

Y

l0_i>0 l0_i

≥

1

·

1

· (

k

−

2

)

ke−3

.

(5)

3. A bound for the graph case

For the graph case li

=

2 for all i; hence no isolated vertex can be an edge. We will write G instead of H for a graph.

In particular, H(

v;

l1

, . . . ,

le) will be written as G(

v,

e) and fd

(v;

l1

, . . . ,

le

)

as fd

(v,

e

)

.Theorems 2.2and 2.3then yield

fe

(v,

e

) =

2efor

v ≥

2e and fe

(

2e

−

1

,

e

) =

2e−1

+

e2e−3.

Further, we have

Lemma 3.1. fe

(

e

+

1

,

e

) =

e

+

1.

Proof. Since an edge has two vertices. Any set of e vertices must be a vertex-cover as it leaves at most one vertex out in an

edge. Clearly, there are e

+

1 sets of e vertices.

For the general case, we give a lower bound.

Theorem 3.2. fe

(v,

e

) ≥

2v−e−1

(

2e

−

v +

2

)

for e

+

1

≤

v ≤

2e

−

1.

Proof. ByLemma 2.1, it suffices to consider graphs with no isolated vertex. Suppose such a graph G has c components. Then

c

≥

v −

e, where equality prevails only when each component is a tree.

Let component Cihave

v

ivertices and eiedges. Then any choice of

v

i

−

1 vertices is a vertex-cover of Ci. Further, any G

has at least

v −

e trees. Fix

v −

e

−

1 trees, say, C1

, . . . ,

Cv−e−1of G and let G0consist of the remaining components (G0is a tree if G is a forest) with

v

0_{vertices. Then any choice of}

_v

0

₋

_{1 vertices of G}0_{is a vertex-cover of G}0_{, and there are}

_v

0_{such set.} By taking

v

i

−

1 vertices from each Ci, and 1

≤

i

≤

v −

e

−

1, and

v

0

−

1 vertices of G0, we obtain an e-vertex-cover of G, and

there are

v

0

Q

v−e−1

i=1

v

iof them, with

P

v−e−1

i=1

v

i

+

v

0

=

v.

Since for a

<

b, ab

≤

(

a

+

1

)(

b

−

1

)

,

v

0

Q

v−e−1

i=1

v

iis minimized by the most uneven distribution of

v

i, namely, all

v

i

=

2

except one

v

i

=

2e

−

v +

2.

Note that for

v =

e

+

1, this bound yields a value of 20

₍

_e

₊

₁

_{) =}

_e

₊

_{1, which matches f}

e

(

e

+

1

,

e

)

as shown inLemma 3.1.

But for

v −

1

>

1, the bound can certainly be strengthened by allowing some trees or G0to have all their vertices taken and other Ciwith

v

i

≥

3 to have more than one vertex not taken. In particular, we have

Corollary 3.3. Suppose

v <

2e. Then fe

(v,

e

) ≥

2v−e−1

(

2e

−

v +

2

) + P

2e

−v

x=min{e+1−d_v/2e_,3e−2v+2}2x

−(3e−2v+2)_.

Proof. It is easily verified that a connected graph with n vertices has an x-vertex-cover for every x

≥ b

n

/

2

c

. Since G0has 2e

−

v+

2 vertices, it has an x-vertex-cover for every e

+

1

−d

v/

2

e ≤

x

≤

2e

−

v

. This x-vertex-cover plus one vertex from each of the

v−

e

−

1 2-trees constitute a

(

x

+

v−

e

−

1

)

-vertex-cover for G. Suppose e

−

(

x

+

v−

e

−

1

) =

2e

−

x

−

v+

1

≤

v−

e

−

1, or 3e

−

2

v +

2

≤

x. Then we can choose two vertices from 2e

−

x

−

v +

1 2-trees and one from the rest 2-trees to obtain an e-vertex-cover of G. The number of such choices is 2x−(3e−2v+2)_.

4. An application

A long DNA molecule M is often cut into short segments called clones for easy storage and reproduction. Typically, it is cut more than once, with each cutting having independent cutpoints, to facilitate reconstruction of M. One approach of reconstruction is to make use of many sequence-tagged-sites (STS) each is assumed to have a unique appearance in M. By identifying for each clone which set of STS it contains, we can use this information to sequence overlapped clones. The identification is done one STS at a time. A clone is called positive if it contains this STS and negative if not. Suppose d cuttings have been made. Due to the unique presence of an STS in M, at most d clones can be positive (‘‘at most’’ because an STS can be cut into half in a cutting, or a clone can be damaged).

Given a set of n clones containing e

(

e

≤

d

)

positive clones, where n can be in the thousands while e is single-digit, the currently most efficient way of identifying the positive clones is through group testing [1]. A group test applies to a subset of the n clones with two possible outcomes: a positive outcome indicates that the subset contains a positive outcome (not knowing which or how many), while a negative outcome indicates otherwise. Since each test is a biological experiment taking several hours, it is crucial that all tests can be performed parallelly. This implies that all subsets under testing must be determined simultaneously, known as nonadaptive group testing in the literature, or a pooling design as preferred by biologists.

A major tool in constructing a pooling design is the d-disjunct matrix, which is a binary matrix such that if a column is viewed as a set of row indices (those rows with a 1-entry), then no column is covered by the union of any d columns. Let M denote the incidence matrix between test-subsets (rows) and clones (columns). Kautz and Singleton [2] proved that if M is d-disjunct, then it can identify the e positive clones if e

≤

d by simply noting that any clone which appears in a negative

pool is a negative clone, and the others are positive clones.

Macula [3] generalized the notion of d-disjunctness to dz_{-disjunctness where every column has at least z 1-entries not}

covered by the union of any other d columns. A dz_{-disjunct matrix allows a negative clone C to be identified even though up}

(6)

all recorded as positive, but C still appears in at least

d

z

/

2

e

negative pools (correctly recorded) to be identified. Note that a

d-disjunct matrix is a d1_{-disjunct matrix and offers no error tolerance.}

Ngo and Du gave a construction of dz_{-disjunct matrices. Consider 2k vertices. A matching is called an m-matching if it}

consists of m matches. Then there are

g

(

m

,

k

) =

2k 2m

2k

!

k

!

2k

m-matchings, where m

≤

k. Construct a g

(

d

,

k

) ×

n matrix M by indexing its rows by all the d-matchings, d

<

k, and its

columns by n arbitrary (but distinct) k-matchings. M has a 1-entry in cell

(

i

,

j

)

if and only if the index of row i is contained in the index of column j. For each column C and a set D

= {

D1

, . . . ,

Dd

}

of other columns, define a hypergraph H(C,D) whose

vertices are the matches in the k matching of C, and edge Eiconsists of the matches in C

\

Di

,

1

≤

i

≤

d. Since each pair of

maximum matchings differ in at least two matches, li

≡ |

Ei

| ≥

2. Ngo and Du proved that M is dz-disjunct with z

=

d

+

1.

ByTheorems 2.2and3.2andLemma 2.6, we improve z to

2k−d−1

(

2d

−

k

+

2

)

for d

+

1

≤

k

≤

2d

−

1

,

and

2d for 2d

≤

k

.

Acknowledgements

The authors would like to thank the referees for their helpful comments in revising this paper.

The first, third and fourth authors’ research was partially supported by a Republic of China National Science Council grant NSC 92-2115-M-009-014. The second author’s research was partially supported by a Republic of China National Science Council grant NSC 92-2115-M-009-004.

References

[1] D.Z. Du, F.K. Hwang, Pooling Designs and Nonadaptive Group Testing, World Scientific, Singapore, 2006. [2] W.H. Kautz, R.C. Singleton, Nonrandom binary superimposed codes, IEEE Trans. Inform. Thy. 10 (1964) 363–377.

[3] A.J. Macula, Error-correcting nonadaptive group testing with de-disjunct matrices, Discrete Appl. Math. 80 (1997) 217–222. [4] H.Q. Ngo, D.Z. Du, New constructions of non-adaptive and error-tolerance pooling designs, Discrete Math. 243 (2002) 161–170.