Contents lists available atScienceDirect
Discrete Applied Mathematics
journal homepage:www.elsevier.com/locate/dam
The minimum number of e-vertex-covers among hypergraphs with e
edges of given ranks
F.H. Chang
a,∗, H.L. Fu
b, F.K. Hwang
b, B.C. Lin
caDepartment of Applied Mathematics, National Chiayi University, Chiayi City, 60004, Taiwan bDepartment of Applied Mathematics, National Chiao Tung University, Hsinchu, 30050, Taiwan cDepartment of Math, National Central University, Chung-Li, 32054, Taiwan
a r t i c l e i n f o Article history:
Received 25 May 2004
Received in revised form 28 August 2007 Accepted 21 May 2008
Available online 30 June 2008
Keywords: Vertex-cover Hypergraph Pooling design
a b s t r a c t
We study the problem that among all hypergraphs with e edges of ranks l1, . . . ,leandv
vertices, which hypergraph has the least number of vertex-covers of size e. The problem is very difficult and we only get some partial answers. We show an application of our results to improve the error-tolerance of a pooling design proposed in the literature.
© 2008 Elsevier B.V. All rights reserved.
1. Introduction
Let H(V, E) denote a hypergraph with vertex-set V and edge set E
= {
E1, . . . ,
Ee}
. Let lidenote the rank of Ei, i.e. li= |
Ei|
.Then li
=
1 is allowed. A subset V0⊆
V is called a vertex-cover of H if it intersects every Ei∈
E. It is called a d-vertex-cover if V0=
d. Let fd(
H)
denote the number of d-vertex-covers in H.Define
H
(v;
l1, . . . ,
le) = {
H(
V,
E) : |
V| =
v, |
E| =
e, |
Ei| =
li,
i=
1, . . . ,
e}
and
fd
(v;
l1, . . . ,
le) =
min{
fd(
H) :
H∈
H(v;
l1, . . . ,
le)}.
We study fd
(v;
l1, . . . ,
le)
in this paper.This problem is motivated by the construction of error-detecting and error-correcting pooling designs (used in clone library screening). Ngo and Du [4] observed that the number of e-vertex-covers has bearing on the error-correcting ability of a pooling design.
2. The hypergraph case
A vertex is called an isolated vertex if it is not in any edge. A hypergraph H(V, E) is called optimal in H
(v;
l1, . . . ,
le)
if itachieves fd
(v;
l1, . . . ,
le)
.Lemma 2.1. For any given H
(v;
l1, . . . ,
le)
, there exists an H∗such that H∗does not have both an isolated vertex and a vertexshared by two edges.
∗Corresponding author. Tel.: +886 5 2717917.
E-mail addresses:fei@mail.ncyu.edu.tw(F.H. Chang),hlfu@math.nctu.edu.tw(H.L. Fu),fkhwang@gmail.com(F.K. Hwang),beychi.lin@gmail.com (B.C. Lin).
0166-218X/$ – see front matter©2008 Elsevier B.V. All rights reserved. doi:10.1016/j.dam.2008.05.006
Proof. Let H be a hypergraph in H
(v;
l1, . . . ,
le)
with an isolated vertex i and a vertex j shared by two edges a and b. Let H0be obtained from H by eliminating i and splitting j into two vertices jaand jbsuch that ja
∈
a and jb∈
b, i.e., a and b no longershare the vertex j. We show fe
(
H0) ≤
fe(H) by mapping all e-vertex-covers C0of H0to distinct e-vertex-covers C of H.We consider several cases: (i) C0contains neither j
anor jb. Set C0
=
C.(ii) C0contains both j
aand jb. C is obtained by replacing jawith j and jbwith i.
(iii) C0contains j
abut not jb. C is obtained by replacing jawith j.
(iv) C0contains j
bbut not ja, and no other vertex of b. C is obtained by replacing jbwith j.
(v) C0contains j
band another vertex of b, but no ja. C is obtained by replacing jbwith i.
Clearly, C is an e-vertex-cover of H. To check that the mapping is injective, it is obvious that the only time two distinct
e-vertex-covers C0and C00of H0can map to the same C is when C0and C differ only in one vertex, i.e., C0
=
S∪ {
ja}
, when S isa set of e
−
1 vertices containing neither janor jb. Consider C00, S must contain a vertex of a other than ja. Hence S containsboth a vertex of a and a vertex of b. From our mapping rule, C0will be mapped to S
∪ {
j}
, while C00to S∪ {
i}
, two distinct e-vertex-covers in H.By repeating this procedure, and since
|
H(v;
l1, . . . ,
le)|
is finite, eventually we reach a hypergraph which has either noisolated vertices or no vertices shared by two edges. Since this argument holds for all H
(v;
l1, . . . ,
le)
,Lemma 2.1follows.Define l
=
P
ei=1liand L
=
Q
ei=1li. For
v ≥
l, fromLemma 2.1, fe(v;
l1, . . . ,
le)
is nondecreasing inv
. We next show thatit reaches maximum at a certain
v
.Theorem 2.2. fe
(v;
l1, . . . ,
le) =
L forv ≥
l.Proof. Since any H
∈
H(v;
l1, . . . ,
le)
with two intersecting edges must have an isolated vertex, byLemma 2.1, there existsan optimal hypergraph H∗with no intersecting edges, i.e., H∗contains e disjoint edges of ranks l
1
, . . . ,
leandv −
l isolatedvertices. Clearly, H∗has L e-vertex-covers. Theorem 2.3. fe
(
l−
1;
l1, . . . ,
le) = Q
i6=1,2lih
l1l2−
1+
12P
i6=1,2(
li−
1)
i
.Proof. Without loss of generality, assume l1
≥
l2≥ · · · ≥
le. ByLemma 2.1, it suffices to consider H with no isolatedvertices.
Since
v =
l−
1, H contains exactly two edges intersecting in one vertex. Let Hmndenote the hypergraph where EmandEnintersect, and fe
(
Hmn)
its number of e-vertex-covers. Thenfe
(
Hmn) =
[(
lm−
1) (
ln−
1) +
lm+
ln−
2]Y
i6=m,n li+
X
k6=m,n lk 2Y
i6=m,n,k li=
(
lmln−
1)
Y
i6=m,n li+
1 2X
k6=m,n lk(
lk−
1)
Y
i6=m,n,k li=
(
lmln−
1)
Y
i6=m,n li+
1 2X
i6=m,n(
li−
1)
Y
i6=m,n li=
Y
i6=m,n li"
lmln−
1+
1 2X
i6=m,n(
li−
1)
#
.
If l1
=
l2= · · · =
le, thenTheorem 2.3obviously holds; if not, we prove:Suppose ly
>
lz. Then fe(
Hxy) <
fe(
Hxz)
for all other x∈ {
1, . . . ,
e}
.fe
(
Hxy) =
Y
i6=x,y li"
lxly−
1+
1 2X
i6=x,y(
li−
1)
#
=
Y
i6=x,y,z li"
lxlylz−
lz+
1 2lz(
lz−
1) +
1 2lzX
i6=x,y,z(
li−
1)
#
=
Y
i6=x,y,z li"
lxlylz+
1 2lz(
lz−
3) +
1 2lzX
i6=x,y,z(
li−
1)
#
<
Y
i6=x,y,z li"
lxlylz+
1 2ly ly−
3+
1 2lyX
i6=x,y,z(
li−
1)
#
=
fe(
Hxz).
Fig. 1. Graphs withv =l−2 vertices and e edges.
For
v =
l−
2, we have to restrict our attention to the case l1=
l2= · · · =
le=
k. Assume e≥
4, then there are fourhypergraphs as shown inFig. 1.
We count the number of e-vertex-covers for each of them.
Lemma 2.4. For
v =
l−
2,
l1=
l2= · · · =
le=
k, the number of e-vertex-covers for the graphs inFig. 1is listed respectivelyas follows: (i) fe
((
a)
k) = (
k−
1)
3ke−3+
3k−
3 2 ke−3+
3k−
3 1e
−
3 1k 2 ke−4
+
3k−
3 0"
e−
3 1k 3 ke−4
+
e−
3 2k 2 2 ke−5
#)
.
(ii) fe((
b)
k) = (
k−
1)
2(
k−
2)
ke−3+
2 3k−
4 2−
2k−
3 2 ke−3+
k−
1 1e
−
3 1k 2 ke−4
+
3k−
4 1 ke−3+
3k−
4 0e
−
3 1k 2 ke−4
.
(iii) fe((
c)
k) = (
k−
1)
4ke−4+
2("
2 k−
1 2k
−
1 1+
k−
1 1 2 2k−
2 1#
ke−4+
k−
1 1 2 e−
4 1k 2 ke−5
)
+
4k−
4 2 ke−4+
4k−
4 1e
−
4 1k 2 ke−5
+
4k−
4 0"
e−
4 1k 3 ke−5
+
e−
4 2k 2 2 ke−6
#)
.
(iv) fe((
d)
k) = (
k−
2)
2ke−2+
2 2k−
4 1 ke−2+
2k−
4 0e
−
2 1k 2 ke−3
+
ke−2.
Proof. We start with the proof of (i) forFig. 1(a). Clearly, if an e-vertex-cover does not contain the vertex x, then each of the three edges incident to x has k
−
1 choices as vertex covers and all the other edges have k choices. Thus, the number ofe-vertex-covers is equal to
(
k−
1)
3ke−3which gives the first term. On the other hand, if an e-vertex-cover does contain x, then it contains at most two other vertices in the x-tree since we have e−
2 components. So, the second term can be obtained by considering the number of extra vertices in the x-tree (different from x) which are chosen for the e-vertex-cover. Now, it is not difficult to see that if the e-vertex-cover contains three vertices of the x-tree, then it contains exactly one vertex of theother e
−
3 edges; if the e-vertex-cover contains two vertices of the x-tree, then one of the remaining e−
3 edges contains exactly two vertices and each of the other e−
4 edges contains exactly one vertex; and if the e-vertex-cover contains onlyx in the x-tree, then we have choices of
(
3,
1,
1, . . . ,
1)
or(
2,
2,
1, . . . ,
1)
for the other e−
3 edges. Therefore, we have the second term.Following the same line of reasoning, in each of (ii), (iii), (iv), the terms are broken down into taking none of x
,
y, oneof them, and both of them. It is worth of noting that in (ii)
3k−4 2−
2k−3 2represents the case that, assuming x is taken but not y, then two other vertices are taken from the
(
x,
y)
-tree, at least one of them from the edge not incident to x; and in (iii) the last three terms represent taking both x and y, and 2 or 1 or 0 other vertices from the x-tree and the y-tree. Theorem 2.5. fe(
l−
2,
ke) =
fe((
b)
k)
for e≥
4 and k≥
2.Proof. Let fe
(
l−
2,
ke)
denote the case that all e edges have length k. By using MAPLE, we obtainfe
((
a)
k) −
fe((
b)
k) =
ke−33
(
k−
1)
2e2−
(
5k+
11) (
k−
1)
e+
12(
k+
1) /
24=
ke−3{
[3(
k−
1)
e−
(
2k+
8)
] [(
k−
1)
e−
(
k+
1)
]−
2(
k−
2) (
k+
1)} /
24.
fe
((
a)
k) −
fe((
b)
k)
is clearly increasing in e. So it suffices to prove fe((
a)
k) −
fe((
b)
k) ≥
0 for e>
4.For e
=
4, fe((
a)
k) −
fe((
b)
k) =
k [10(
k−
2) (
3k−
5) −
2(
k−
2) (
k+
1)
]/
24=
k(
k−
2)
[5(
3k−
5) − (
k+
1)
]/
12=
k(
k−
2) (
7k−
13) /
6≥
0 for k≥
2.
fe((
c)
k) −
fe((
b)
k) =
ke−4(
k−
1)
3(
k−
1)
e2−
(
11k+
5)
e+
4(
2k+
5) +
24/
24=
ke−4{
(
k−
1)
[[3(
k−
1)
e−
4(
2k+
5)
](
e−
1) +
12e]+
24}
/
24=
ke−4{
(
k−
1)
[[3(
k−
1)
e−
4(
2k+
5) +
12](
e−
1) +
12]+
24}
/
24=
ke−4{
(
k−
1)
[[3(
k−
1)
e−
8(
k+
1)
](
e−
1) +
12]+
24}
/
24.
fe
((
c)
k) −
fe((
b)
k)
is clearly increasing in e. So it suffices to prove fe((
c)
k) −
fe((
b)
k) ≥
0 for e>
4.For e
=
4, fe((
c)
k) −
fe((
b)
k) = {(
k−
1)
[[12(
k−
1) −
8(
k+
1)
]·
3+
12]+
24}
/
24=
[(
k−
1) (
12k−
48) +
24]/
24=
k2−
5k+
6/
2=
(
k−
2) (
k−
3) /
2≥
0 for k≥
2.
fe((
d)
k) −
fe((
b)
k) =
ke−3(
ke−
e−
k−
1) /
2≥
0.
While we have no result on general fe
(v;
l1, . . . ,
le)
, the following lemma helps us to obtain lower bounds.Lemma 2.6. Suppose li
≥
l0ifor 1≤
i≤
e. Then fe(v;
l1, . . . ,
le) ≥
fe(v;
l01, . . . ,
l 0e
)
.In particular, fe
(v;
l1, . . . ,
le) ≥
fe(v,
ke)
if li≥
k for 1≤
i≤
e.Lemma 2.7. Let C be a vertex-cover of G such that
|
C| =
c<
e and S be the set of vertices in G such that each vertex is incident to at least two edges of G. Let|
Ei\
S| =
l0i, i=
1,
2, . . . ,
e. Then fe(v;
l1, . . . ,
le) ≥
v−e−cc+ Q
l0
i
.
Proof. v−e−cc
represents the number of e-covers which contains C andQ
l0
iis the number of e-covers which are different
from the above e-covers (since c
<
e, C contains at least one vertex in S). Proposition 2.8. If e(
k−
1) +
1≤
v <
ek, then fe(v,
ke) ≥ (
k−
2)
ke−3.Proof. Since
|
S| ≤
k−
1, eX
i=1 l0i≥
k(
e−
2) .
HenceY
l0i>0 l0i≥
1·
1·
(
k−
2)
ke−3.
3. A bound for the graph case
For the graph case li
=
2 for all i; hence no isolated vertex can be an edge. We will write G instead of H for a graph.In particular, H(
v;
l1, . . . ,
le) will be written as G(v,
e) and fd(v;
l1, . . . ,
le)
as fd(v,
e)
.Theorems 2.2and 2.3then yieldfe
(v,
e) =
2eforv ≥
2e and fe(
2e−
1,
e) =
2e−1+
e2e−3.Further, we have
Lemma 3.1. fe
(
e+
1,
e) =
e+
1.Proof. Since an edge has two vertices. Any set of e vertices must be a vertex-cover as it leaves at most one vertex out in an
edge. Clearly, there are e
+
1 sets of e vertices.For the general case, we give a lower bound.
Theorem 3.2. fe
(v,
e) ≥
2v−e−1(
2e−
v +
2)
for e+
1≤
v ≤
2e−
1.Proof. ByLemma 2.1, it suffices to consider graphs with no isolated vertex. Suppose such a graph G has c components. Then
c
≥
v −
e, where equality prevails only when each component is a tree.Let component Cihave
v
ivertices and eiedges. Then any choice ofv
i−
1 vertices is a vertex-cover of Ci. Further, any Ghas at least
v −
e trees. Fixv −
e−
1 trees, say, C1, . . . ,
Cv−e−1of G and let G0consist of the remaining components (G0is a tree if G is a forest) withv
0vertices. Then any choice ofv
0−
1 vertices of G0is a vertex-cover of G0, and there arev
0such set. By takingv
i−
1 vertices from each Ci, and 1≤
i≤
v −
e−
1, andv
0−
1 vertices of G0, we obtain an e-vertex-cover of G, andthere are
v
0Q
v−e−1i=1
v
iof them, withP
v−e−1i=1
v
i+
v
0=
v.
Since for a
<
b, ab≤
(
a+
1)(
b−
1)
,v
0Q
v−e−1i=1
v
iis minimized by the most uneven distribution ofv
i, namely, allv
i=
2except one
v
i=
2e−
v +
2.Note that for
v =
e+
1, this bound yields a value of 20(
e+
1) =
e+
1, which matches fe
(
e+
1,
e)
as shown inLemma 3.1.But for
v −
1>
1, the bound can certainly be strengthened by allowing some trees or G0to have all their vertices taken and other Ciwithv
i≥
3 to have more than one vertex not taken. In particular, we haveCorollary 3.3. Suppose
v <
2e. Then fe(v,
e) ≥
2v−e−1(
2e−
v +
2) + P
2e−v
x=min{e+1−dv/2e,3e−2v+2}2x
−(3e−2v+2).
Proof. It is easily verified that a connected graph with n vertices has an x-vertex-cover for every x
≥ b
n/
2c
. Since G0has 2e−
v+
2 vertices, it has an x-vertex-cover for every e+
1−d
v/
2e ≤
x≤
2e−
v
. This x-vertex-cover plus one vertex from each of thev−
e−
1 2-trees constitute a(
x+
v−
e−
1)
-vertex-cover for G. Suppose e−
(
x+
v−
e−
1) =
2e−
x−
v+
1≤
v−
e−
1, or 3e−
2v +
2≤
x. Then we can choose two vertices from 2e−
x−
v +
1 2-trees and one from the rest 2-trees to obtain an e-vertex-cover of G. The number of such choices is 2x−(3e−2v+2).4. An application
A long DNA molecule M is often cut into short segments called clones for easy storage and reproduction. Typically, it is cut more than once, with each cutting having independent cutpoints, to facilitate reconstruction of M. One approach of reconstruction is to make use of many sequence-tagged-sites (STS) each is assumed to have a unique appearance in M. By identifying for each clone which set of STS it contains, we can use this information to sequence overlapped clones. The identification is done one STS at a time. A clone is called positive if it contains this STS and negative if not. Suppose d cuttings have been made. Due to the unique presence of an STS in M, at most d clones can be positive (‘‘at most’’ because an STS can be cut into half in a cutting, or a clone can be damaged).
Given a set of n clones containing e
(
e≤
d)
positive clones, where n can be in the thousands while e is single-digit, the currently most efficient way of identifying the positive clones is through group testing [1]. A group test applies to a subset of the n clones with two possible outcomes: a positive outcome indicates that the subset contains a positive outcome (not knowing which or how many), while a negative outcome indicates otherwise. Since each test is a biological experiment taking several hours, it is crucial that all tests can be performed parallelly. This implies that all subsets under testing must be determined simultaneously, known as nonadaptive group testing in the literature, or a pooling design as preferred by biologists.A major tool in constructing a pooling design is the d-disjunct matrix, which is a binary matrix such that if a column is viewed as a set of row indices (those rows with a 1-entry), then no column is covered by the union of any d columns. Let M denote the incidence matrix between test-subsets (rows) and clones (columns). Kautz and Singleton [2] proved that if M is d-disjunct, then it can identify the e positive clones if e
≤
d by simply noting that any clone which appears in a negativepool is a negative clone, and the others are positive clones.
Macula [3] generalized the notion of d-disjunctness to dz-disjunctness where every column has at least z 1-entries not
covered by the union of any other d columns. A dz-disjunct matrix allows a negative clone C to be identified even though up
all recorded as positive, but C still appears in at least
d
z/
2e
negative pools (correctly recorded) to be identified. Note that ad-disjunct matrix is a d1-disjunct matrix and offers no error tolerance.
Ngo and Du gave a construction of dz-disjunct matrices. Consider 2k vertices. A matching is called an m-matching if it
consists of m matches. Then there are
g
(
m,
k) =
2k 2m 2k!
k!
2km-matchings, where m
≤
k. Construct a g(
d,
k) ×
n matrix M by indexing its rows by all the d-matchings, d<
k, and itscolumns by n arbitrary (but distinct) k-matchings. M has a 1-entry in cell
(
i,
j)
if and only if the index of row i is contained in the index of column j. For each column C and a set D= {
D1, . . . ,
Dd}
of other columns, define a hypergraph H(C,D) whosevertices are the matches in the k matching of C, and edge Eiconsists of the matches in C
\
Di,
1≤
i≤
d. Since each pair ofmaximum matchings differ in at least two matches, li
≡ |
Ei| ≥
2. Ngo and Du proved that M is dz-disjunct with z=
d+
1.ByTheorems 2.2and3.2andLemma 2.6, we improve z to
2k−d−1
(
2d−
k+
2)
for d+
1≤
k≤
2d−
1,
and2d for 2d
≤
k.
Acknowledgements
The authors would like to thank the referees for their helpful comments in revising this paper.
The first, third and fourth authors’ research was partially supported by a Republic of China National Science Council grant NSC 92-2115-M-009-014. The second author’s research was partially supported by a Republic of China National Science Council grant NSC 92-2115-M-009-004.
References
[1] D.Z. Du, F.K. Hwang, Pooling Designs and Nonadaptive Group Testing, World Scientific, Singapore, 2006. [2] W.H. Kautz, R.C. Singleton, Nonrandom binary superimposed codes, IEEE Trans. Inform. Thy. 10 (1964) 363–377.
[3] A.J. Macula, Error-correcting nonadaptive group testing with de-disjunct matrices, Discrete Appl. Math. 80 (1997) 217–222. [4] H.Q. Ngo, D.Z. Du, New constructions of non-adaptive and error-tolerance pooling designs, Discrete Math. 243 (2002) 161–170.