Discrete Mathematics 268 (2003) 311–314
www.elsevier.com/locate/disc
Note
On Macula’s error-correcting pool designs
F.K. Hwang
Department of Applied Mathematics, National Chiao Tung University, Hsin-chu 30050, Taiwan, ROC Received 11 December 2001; received in revised form 15 November 2002; accepted 3 December 2002
Abstract
We show that Macula’s claim of a Hamming distance 4 between any two candidate sets of positive clones in his pool design is incorrect. However, a previous proof of his on a weaker result (with a condition on design parameters) is correct. We also show that the condition is sharp and the distance 4 result is also sharp for arbitrary parameter values.
c
2003 Elsevier Science B.V. All rights reserved.
Keywords: Pooling designs; Group testing; Error-correcting; Disjunct matrix
1. Introduction
A clone library stores clones which are subsequence of a particular DNA sequence. Often, one needs to know which clones contain a given probe, a speci;ed DNA subse-quence of interest. We will call a clone positive if it contains the probe, and negative if not. It would be time-consuming and costly if we have to assay the clones one by one. Since typically the number of positive clones is small, one can pool a subset of clones together for an assay. The assay outcome is negative if all clones in the pool are negative, and is positive otherwise. A pool design is a 0 − 1 matrix where columns represent clones, rows represent pools and an 1-entry in cell (i; j) signi;es that clone j is in pool i. The goal of a pool design is to identify the positive clones from the negative clones as much as possible with a minimum number of pools.
For a binary matrix with t rows, we can view each column as a subset of the set {1; : : : ; t} in terms of the positions of the 1-entries. Such a matrix is called d-disjunct if no column is contained in the union of any other d columns. It is well known [1] that
Research partially supported by the Republic of China NSC grant 90-2115-M-009-029.
E-mail address:[email protected](F.K. Hwang).
0012-365X/03/$ - see front matter c 2003 Elsevier Science B.V. All rights reserved. doi:10.1016/S0012-365X(03)00034-7
312 F.K. Hwang / Discrete Mathematics 268 (2003) 311–314
a d-disjunct matrix can identify all positive clones as long as the number p of positive clones satis;es p 6 d. Recently, Macula [3] introduced the notion of de-disjunct if any
column has at least e + 1 1-entries not in the union of any other d columns. Another relevant notion is the Hamming distance H(M) of a d-disjunct matrix M which is de;ned to be the minimum number of bit disagreement between a union of u columns and a union of v columns, u 6 v 6 d.
Macula [2] gave a construction of a d-disjunct matrix. Suppose there are z clones to be screened. Select n; k; d such that d ¡ k andnk¿ z. Let [n] denote the set {1; : : : ; n} and [n]k the set of all k-subsets of n. Randomly select z members of [n]k to label the clones (columns), and label the rows by the set [n]
d
(so there are n d
rows). The design z(n; d; k) has an 1-entry in cell (i; j) if and only if the label of row i is
contained in the label of column j. Macula proved that z(n; d; k) is d-disjunct.
Macula [3] also considered the enhanced matrix ∗
z(n; d; k) which is obtained from
z(n; d; k) by adding n additional pools labeled { I1; I2; : : : ; In}, where Ii contains all clones
whose labels do not contain i. He claimed that H(∗
z(n; d; k)) ¿ 4 (hence
1-error-correcting) by proving Theorem 1. ∗
z(n; d; k) is d1-disjunct.
We will show that this claim is wrong on several counts. Nevertheless, a previous weaker claim of Macula as reported by Du and Hwang [1] remains correct:
Theorem 2. Suppose k − d ¿ 3. Then H(∗
z(n; d; k)) ¿ 4.
Further, we show that both the condition k − d ¿ 3 and the result of distance 4 are sharp.
2. The main result
We ;rst give a counter-example against Theorem 1. Example 1. ∗
z(5; 2; 3) containing three columns C0= {1; 2; 3}, C1 = {1; 2; 4}, C2= {1; 3; 5}. It is easily veri;ed that the only 1-entry in C0 but not in the union of C1 and
C2 is the row with label (2; 3). Hence ∗z(5; 2; 3) is not d1-disjunct.
The problem in the proof of Theorem 1 lies in the statement that let C0; C1; : : : ; Cd
be d + 1 distinct columns and |C0\Ci| = 1 for 1 6 i 6 d, then C0\Ci= C0\Cj implies
Ci\C0= Cj\C0. The above example shows that the implication is not realized since
C1\C0= 4 = C3\C0= 5.
Example 1 can be extended to general d, k with k ¿ d. Let Ci= [k + 1]\{k + 1 − i}; 0 6 i 6 d − 1;
F.K. Hwang / Discrete Mathematics 268 (2003) 311–314 313 Then the only 1-entry in C0 but not in the union of C1; : : : ; Cd is the row with label {k − d + 1; k − d + 2; : : : ; k}.
Next we argue that even though Theorem1 were correct, it would not be enough to substantiate the claim that H(∗
z(n; d; k)) ¿ 4. This is because the two candidate sets of
positive clones can diLer only in one column C. Then the Hamming distance between those two sets is simply the number of 1-entries in C but not in the union of the other columns, which is only guaranteed to be 2 by Theorem1. Note that d1-disjunct would
imply H(∗
z(n; d; k)) 6 4 if d is the exact number of positive clones, not just an upper
bound.
In a diLerent sense, the d1-disjunctness is too strong a property to prove a Hamming
distance 4. For example, one column in one candidate set may contribute only distance 1, while the other candidate set contributes distance 3 to compensate. The two sets have Hamming distance 4, but do not satisfy d1-disjunctness. Note that the counter-example
given at the beginning of this section is not a counter-example against Theorem 2 since it is easily veri;ed that any two candidate sets of cardinality 6 2 have Hamming distance at least 4. A formal proof of Theorem 2 can be found in [1].
Can the condition k − d ¿ 3 in Theorem 2 be eliminated (as in Theorem 1) or at least weakened? The following example shows that it cannot.
Example 2. ∗
z(7; 3; 5) containing columns C1= {1; 2; 3; 4; 5}, C2 = {1; 2; 3; 4; 6} and
C3= {1; 2; 3; 5; 7}. Consider the two candidate sets {C1; C2; C3} and {C2; C3}. It is
easily veri;ed that they diLer only in three rows with labels {1; 4; 6}, {2; 4; 6}, {3; 4; 6}. We now expand the example to arbitrary k with d = k − 2 and d ¿ 3.
Let n ¿ k + 2, then ∗
z(n; k − 2; k) contains k − 2 columns
Ci= [k + 1]\{k + 2 − i}; 0 6 i 6 k − 3; and
Ck−2= [k + 2]\{4; k + 1}:
Then the two candidate sets {C0; C1; : : : ; Ck−3} and {C1; : : : ; Ck−3} diLer only in rows
with labels {1; 4; 5; : : : ; k}, {2; 4; 5; : : : ; k} and {3; 4; 5; : : : ; k}.
Examples for k − d ¡ 2 are even easier to construct and omitted here.
Next we show that regardless of how large is k−d, the guaranteed Hamming distance remains at 4.
Example 3. ∗
z(n; 2; k) (where n ¿ k + 1) containing three columns C1= {1; : : : ; k},
C2= {1; : : : ; k − 1; k + 1}, C3= {1; : : : ; k − 2; k; k + 1}. Consider two candidate sets {C1; C2} and {C2; C3}. It is easily veri;ed that the only four diLerent rows are those
labeled by {k − 1; k}, {k; k + 1}, {k − 1} and {k + 1}. Again, Example 3 can be extended to general d. Let
Ci= [k + 1]\{k + 2 − i}; 1 6 i 6 d + 1:
Then the two candidate sets {C1; : : : ; Cd} and {C2; : : : ; Cd+1} diLer only in the four
rows with labels {k − d + 1; k − d + 2; : : : ; k}, {k − d + 2; k − d + 3; : : : ; k + 1}, {k − d + 2} and {k + 1}.
314 F.K. Hwang / Discrete Mathematics 268 (2003) 311–314
A referee reminds us that a de-disjunct matrix can correct e errors. The decoding
procedure is to take a subset E of rows, and change all outcomes in these rows. Do this for all E with |E| 6 e. Let V denote the outcome vector before change, and VE ≡ V ∪ E is the outcome vector after change. Then a column C is positive if and
only if there exists an E such that VE contains C. To see this, note that when E is the
set of errors, then the outcome vector is corrected back to the errorless state in which C only appears in rows with positive outcomes. On the other hand, if C is negative, then the de-disjunctness guarantees that C has at least e + 1 rows not in V
E, and at
most e of them are in E, hence C has a row not in VE.
Acknowledgements
The author thanks Y.C. Liu for providing extensions to the examples, and a referee for providing the above paragraph.
References
[1] D.Z. Du, F.K. Hwang, Combinatorial Group Testing and Its Application, 2nd Edition, World Scienti;c, Singapore, 2000.
[2] A.J. Macula, A simple construction of d-disjunct matrices with certain constant weights, Discrete Math. 162 (1996) 311–312.
[3] A.J. Macula, Error correcting nonadaptive group testing with de-disjunct matrices, Discrete Appl. Math.