A class of error-correcting pooling designs over complexes

(1)

DOI 10.1007/s10878-008-9179-4

A class of error-correcting pooling designs over

complexes

Tayuan Huang· Kaishun Wang · Chih-Wen Weng

Published online: 26 July 2008

Abstract As a generalization of de-disjunct matrices and (w, r; d)-cover-free-families, the notion of (s, l)e-disjunct matrices is introduced for error-correcting pooling designs over complexes (or set pooling designs). We show that (w, r, d)-cover-free-families form a class of (s, l)e-disjunct matrices. Moreover, a decoding algorithm for pooling designs based on (s, l)e-disjunct matrices is considered.

Keywords Pooling design· Disjunct matrix · Decoding · Complex

1 Introduction

The notion of superimposed code was first introduced by Kautz and Singleton (1964) in the context of superimposed binary codes, and it was then generalized to de -disjunct matrices by D’yachkov et al. (1989) and by Macula (1997), to superim-posed (s, l)-code, superimsuperim-posed (s, l)-design by D’yachkov et al. (2002), and finally to (w, r; d)-generalized-cover-free families recently by Stinson and Wei (2004).

In the context of (s, l; e)-cover-free families, d-disjunct matrices with (s, l; e) = (1, d; 1) have been generalized to de-disjunct matrices with (s, l; e) = (1, d; e + 1) for error-correcting purpose (D’yachkov et al.1989; Macula1997); on the other hand, it has also been generalized to (s, l)-superimposed designs (D’yachkov et al.2002) with (s, l; e) = (s, l; 1) for the purpose of group testing over complexes. All these

This paper was presented in Algebraic Combinatorics-an international conference, held in Sendai, June 2006, in honor the 60th Birthday of Professor E. Bannai.

T. Huang· C.-W. Weng

Department of Applied Mathematics, National Chiao-Tung University, Hsinchu 30010, Taiwan K. Wang (

)

Sch. Math. Sci. & Lab. Math. Com. Sys., Beijing Normal University, Beijing 100875, China e-mail:[email protected]

(2)

structures have found their applications in the designs of combinatorial group testing applicable to DNA library screening, and they are therefore called pooling designs with various additional properties.

More precisely, consider a set[t] = {1, 2, . . . , t} of molecules, the goal is to iden-tify an unknown family ℘= {P1, P2, . . . , Pk} where the joint appearance of all mole-cules in each Picauses a certain given biological phenomenon. An experiment,

some-times called a pool, can be applied to an arbitrary subset S⊆ [t] with two possible outcomes; a negative outcome implies S does not contain any Pi ∈ ℘, and a

posi-tive outcome implies otherwise. Members of ℘ are called posiposi-tive complexes. Such a model is usually referred as the complex model. Of particular note is the basic as-sumption that members of ℘ are subject to non-inclusion. See (Du and Ngo2000; Du and Hwang2006) for more details and (Chen et al.2008; Huang et al. 2007; Huang et al.2008; Huang and Weng2004) for related study.

In this paper, as a generalization of de-disjunct matrices and (s, l; e)-cover-free-families, the notion of (s, l)e-disjunct matrices is introduced for error-correcting pool-ing designs (or called set poolpool-ing designs, group testpool-ings over complexes). We show that (s, l; e)-cover-free families form a class of (s, l)e-disjunct matrices in Sect.3; moreover, a decoding algorithm for error-correcting pooling designs based on (s, l)e -disjunct matrices is given in Sect.4.

2 Preliminary

For an N× t binary matrix M, let Ri and Cj denote the i-th row and j -th column of

M, respectively. In this paper, we also let Cj denote the subset of[N] consisting of

all i with Mij= 1. For positive integers s, l and t such that s + l ≤ t, let ℘ (s, l, t)

be the family of all antichains ℘= {P1, P2, . . . , Pk} with Pi⊆ [t], |Pi| ≤ l, and 1 ≤ k≤ s. ℘ = {P1, P2, . . . , Pk} is called an antichain if and only if Pi and Pj are not

comparable whenever i and j are distinct.

The model of set pooling designs may be traced back to Torney (1999) and was carried out by D’yachkov et al. (2002).

Definition 2.1 (D’yachkov et al.2002) A binary matrix M of order N× t is called (1) a superimposed (s, l)-code if, for any two disjoint subsets S, L of[t] with |S| = s

and|L| = l, there exists a row with entry 1 over L and 0 over S; (2) a superimposed (s, l)-design if_Pi∈℘(_j∈PiCj)=

P_i∈℘( j∈P iCj)for any two distinct ℘= {P1, P2, . . . , Pk}, ℘= {P1, P2, . . . , Ph} ∈ ℘ (s, l, t).

They showed that each (s, l)-superimposed code is an (s, l)-superimposed de-sign, and each (s, l)-superimposed design is an (s− 1, l)-superimposed code and an (s, l− 1)-superimposed code as well. On the other hand, the following notion of (w, r; d)-cover-free families was introduced by Stinson and Wei (2004).

Definition 2.2 (Stinson and Wei2004) Let w, r and d be positive integers. A set system (X,) is called a (w, r; d)-cover-free-family (or (w, r; d) − CF F ) provided

(3)

that, for any w blocks B1, . . . , Bw∈ and any other r blocks A1, . . . , Ar ∈ , we have that 1≤j≤w Bi− 1≤j≤r Aj ≥ d.

Note that the point-block incidence matrix of an (l, s; 1)-cover-free family is in-deed a superimposed (s, l)-code. The notion of (s, l)e-disjunct matrices is introduced as a common generalization of de-disjunct matrices and (w, r; d)-cover-free-family.

Definition 2.3 For positive integers s, l, t with s+ l ≤ t, a binary matrix M of order

N× t is called an (s, l)e-disjunct matrix if

i∈A Ci− Pi∈℘ j∈Pi Cj ≥ e

for any antichain ℘= {P1, P2, . . . , Pk} ∈ ℘ (s, l, t), and for any A ⊆ [t] with |A| ≤ l and A /∈ ℘.

An (s, l)e_{-disjunct matrix M can be used for a pooling design in the following}

way: Let the columns of M be identified with the set of samples and its rows be identified with pools for testing such that M(i, j )= 1 if the j-th sample is included in the i-th pool. Suppose the set[t] = {1, 2, . . . , t} represents the set of samples with a (to be identified) positive family ℘= {P1, P2, . . . , Pk} ⊆ ℘ ([t]), the power set of [t], each test checks whether a pool contains at least one positive set Pi∈ ℘ completely.

After the testing, the outcome vector

o(℘)= o(℘, M) = the characteristic vector of the set

Pi∈℘

j∈Pi

Cj

is reported for ℘= {P1, P2, . . . , Pk} ∈ ℘ (s, l, t) if there is no error occurred during

the processes, i.e., a test is reported positive only if it contains a certain positive sub-set Pi. Suppose instead that the report o(℘)+ with an error vector is received,

Theorem 4.1 shows that the error occurring during the testing processing can be de-tected whenever the weight of is less than e, and the errors can be corrected when-ever the weight of is no larger than e−1₂ . In case l = 1, each Pi ∈ ℘ is reduced to a singleton, and it then reduces to ke_{-disjunct matrices, whose decoding algorithm}

was discussed in (Huang and Weng2003).

3 Some properties of (s, l)e-disjunct matrices

Some good explicit constructions of generalized cover-free families, as well as non-constructive existence results using the probabilistic method including the Lovasz Local Lemma can be found in (Stinson and Wei2004), some bounds (i.e., necessary conditions) for generalized cover-free families were obtained through two different approaches. Theorem 3.1 shows that generalized cover free families provide a source

(4)

of (s, l)e-disjunct matrices. Some properties of (s, l)e-disjunct matrices are given in Lemma 3.2 and Theorem 3.3 with the consideration of error tolerance over the pool-ing designs based on them.

Theorem 3.1 The point-block incidence matrix M of an (l, s; e)-cover free family {C1, C2, . . . , Ct} is an (s, l)e-disjunct matrix of order N× t.

Proof For any antichain ℘= {P1, P2, . . . , Pk} ∈ ℘ (s, l, t), and for any A ⊆ [t] with

|A| ≤ l and A /∈ ℘, let ai∈ Pi for i≤ k ≤ s and let S ⊆ [t] be an s-subset containing

{a1, . . . , ak}. Then Pi∈℘(j∈PiCj)⊆ 1≤i≤kCai⊆ j∈SCj, and hence i∈A Ci− Pi∈℘ j∈Pi Cj ≥ i∈A Ci− j∈S Cj ≥ i∈A Ci− j∈S Cj ≥ e

where A⊆ A⊆ [t] with |A| = l because {C1, C2, . . . , Ct} is an (l, s; e)-generalized

cover free family.

Lemma 3.2 Let M be an (s, l)e_{-disjunct matrix, then d}

H(o(℘), o(℘))≥ e whenever

℘, ℘∈ ℘ (s, l, t) are distinct.

Proof Without loss of generality, we may assume that ℘− ℘ is non-empty and A∈ ℘− ℘, we have i∈A Ci− B∈℘ j∈B Cj ≥ e

by definition, and therefore dH(o(℘), o(℘))≥ e.

For an (s, l)e_{-disjunct matrix M, we are interested to know the minimum distance,}

i.e., the minimum of the set{dH(o(℘), o(℘))|℘, ℘∈ ℘ (s, l, t)}.

Theorem 3.3 Let M be an (s, l)e-disjunct matrix. Given two distinct ℘, ℘ ∈ ℘ (s, l, t ). Then the following hold:

(1) If ℘⊂ ℘and ℘⊂ ℘, then dH(o(℘), o(℘))≥ 2e. (2) If ℘⊂ ℘, then dH(o(℘), o(℘))≥ e.

Proof (1) Let ℘= {P1, P2, . . . , Pk}, ℘= {P₁, P₂, . . . , P_h} ∈ ℘ (s, l, t). Then dH(o(℘), o(℘)) = Pi∈℘ j∈Pi Cj − P_i∈℘ j∈Pi Cj + P_i∈℘ j∈P_i Cj − Pi∈℘ j∈Pi Cj ≥ j∈Pi Cj− P_i∈℘ j∈Pi Cj + j∈P_i Cj− Pi∈℘ j∈Pi Cj ≥ 2e.

(5)

4 A decoding algorithm based on (s, l)e-disjunct matrices

The methodology used by Kautz and Singleton (1964) has been generalized to a de-coding method for pooling designs based on de_{-disjunct matrices (Huang and Weng}

2003). In this section, we shall show that similar argument works well also for a decoding algorithm of pooling designs based on (s, l)e-disjunct matrices.

Let χA with A⊆ [N] be the output vector for the group testing over the (to be

identified) positive family ℘= {P1, . . . , Pk}. The following provides an decoding algorithm for the pooling design based on a (s, l)e-disjunct matrix M.

Algorithm

Input: the output χAassociated with A⊆ [N]

Output: positive complexes ℘

℘A:= ∅

While Z⊆ [N] and |Z| ≤ l do

If|_j_∈ZCj− χA| ≤ e−1₂ then add Z into ℘A

If|℘A| > s or ℘A= χAthen output “there is an error” else Output ℘= ℘A

Theorem 4.1 Let A⊆ [N], and let

(1) If dH(o(℘), χA)≤ e−1₂ , then ℘ = ℘A.

(2) Suppose dH(o(℘), χA)≤ e − 1 and |℘A| ≤ s. Then o(℘) = χA if and only if

o(℘A)= χA.

Proof (1) Since_i∈ZCi⊆ o(℘) for any Z ∈ ℘, and then j∈Z Cj− χA ≤ dH(o(℘), χA)≤ e− 1 2 ,

it follows that Z∈ ℘A. On the other hand, if Z∈ ℘A but Z /∈ ℘, then |_j_∈ZCj−

o(℘)| ≥ e by definition. Since dH(o(℘), χA)≤ e−1₂ , we then have j∈Z Cj− χA ≥ e− 1 2 + 1, a contradiction.

(2) It is clear that if ℘= ℘A. Now suppose that ℘= ℘A. Then dH(o(℘), χA) >

e− 1 2

(6)

as just shown; in particular, o(℘)= χA. By Lemma3.2, dH o(℘A), χA ≥ dHo(℘), o(℘A) − dHo(℘), χA ≥ e − (e − 1) = 1,

and o(℘A)= χAas required.

Acknowledgement The second author is supported by NSF of China.

References

Chen HB, Fu HL, Hwang FK (2008) An upper bound of the number of tests in pooling designs for the error-tolerant complex model. Opt Lett 2:425–431

D’yachkov AG, Rykov VV, Rashad AM (1989) Superimposed distance codes. Probl Control Inf Theory 18:237–250

D’yachkov AG, Vilenkin P, Macula AM, Torney D (2002) Families of finite sets in which no intersection of l sets is covered by the union of s others. J Comb Theory Ser A 99:195–218

Du D-Z, Hwang FK (2006) Pooling designs and nonadaptive group testing. World Scientific, Singapore Du D-Z, Ngo HQ (2000) A Survey on Combinatorial Group Testing Algorithms with Applications to DNA

Library Screening. DIMACS Ser Discrete Math Theor Comput Sci 55:171–182

Huang T, Weng C (2003) A note on decoding of superimposed codes. J Comb Optim 7:383–384 Huang T, Weng C (2004) Pooling spaces and non-adaptive pooling designs. Discrete Math 282:163–169 Huang H, Huang Y, Weng C (2007) More on pooling spaces. Discrete Math. doi:10.1016/j.disc.

2007.11.073

Huang T, Wang K, Weng C (2008) Pooling spaces associated with finite geometry. Eur J Comb 29:1483– 1491

Kautz W, Singleton R (1964) Nonrandom binary superimposed codes. IEEE Trans Inf Theory 10:363–377 Macula AJ (1997) Error-correcting nonadaptive group testing with de-disjunct matrices. Discrete Appl

Math 80:217–222

Stinson DR, Wei R (2004) Generalized cover-free families. Discrete Math 279:463–477 Torney DC (1999) Sets pooling designs. Ann Comb 3:95–101