Improved construction for pooling design

(1)

J Comb Optim (2008) 15: 123–126 DOI 10.1007/s10878-007-9093-1

Improved construction for pooling design

Ping Deng· F.K. Hwang · Weili Wu ·

David MacCallum· Feng Wang · Taieb Znati

Published online: 19 July 2007

Abstract Pooling design is a mathematical tool with many applications in molecular biology, specially to reduce the number of tests for DNA library screening. In this note, we study construction of pooling design and present an improvement to a recent new construction given by Du et al. (J. Comput. Biol. 13:990–995,2006).

Keywords Pooling design· Transversal design · Multiplication theorem · Disjunct matrix

P. Deng and W. Wu supported in part by National Science Foundation under grants CCF-0514796 and CNS-0524429. T. Znati supported in part by National Science Foundation under grant CCF-0548895.

P. Deng (

)· W. Wu

Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083, USA e-mail: pxd010100@utdallas.edu

W. Wu

e-mail: weiliwu@utdallas.edu F.K. Hwang

Department of Applied Mathematics, National Chiaotung University, Hsing Chu, Taiwan D. MacCallum· F. Wang

Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA D. MacCallum e-mail: fwang@cs.umn.edu F. Wang e-mail: dmac@cs.umn.edu T. Znati

Department of Computer Science, University of Pittsburgh, Pittsburgh, PA 15215, USA e-mail: znati@cs.pitt.edu

(2)

124 J Comb Optim (2008) 15: 123–126 1 Introduction

Given a set of n items with at most d positive ones, we study the problem of iden-tifying all positive items with less number of tests each of which is on a subset of items, called pools, with two outcomes, positive if the pool contains a positive item, and negative if the pool contains no positive item. This problem is called group

test-ing. A group testing algorithm is said to be nonadaptive if all pools are given at the

beginning, that is, no information on test outcomes is available for determining the pool of another test. Nonadaptive group testing is also called pooling design. A pool-ing design is transversal if it can be divided into disjoint families, each of which is a partition of all items into pools in the family.

Pooling design has been found to have many applications in molecular biology. Indeed, the study of gene functions is a hot direction. Such a study is required to have DNA libraries of high quality, which usually obtained through a large amount of testing and screening. Pooling design is a mathematical tool to reduce the number of tests (Marathe et al.2000; D’ychkov et al.2001; Farach et al.1997) to do those jobs.

In Du et al. (2006), Du, Hwang, Wu and Znati gave a new construction for transversal design. In this note, we propose an improvement.

A pooling design is usually represented by a binary matrix with rows indexed with pools and columns indexed with items. A cell (i, j ) contains a 1-entry if and only if the ith pool contains the j th item. This binary matrix is called the incidence

matrix of the represented pooling design. By treating a column as a set of row indices

each intersecting the column with a 1-entry, we can talk about the union of several columns. A binary matrix is (d; z)-disjunct if for any d + 1 columns C0, C1, . . . , Cd,

there are at least z rows at them, C0 contains 1-entries, all C1, . . . , C− d contain 0-entries.

A q-nary matrix is a matrix with entries in{0, 1, . . . , q − 1}. A transversal design can be represented by a q-nary matrix with rows indexed with families and columns indexed with items; a cell (i, j ) contains entry k if and only if item j belongs to the kth pool in the ith family. This matrix representation is called a transversal matrix of the represented transversal design. A q-nary matrix is (d, 1; z)-disjunct if for any d+ 1 columns C0, C1, . . . , Cd, there are at least z rows at each of which the entry of

C0is different from the entry of Cj for 1≤ j ≤ d.

A f×n q-nary matrix can be transformed into a f q ×n binary matrix by replacing

each i-entry with a q-dimensional column vector ei+1which contains a 1-entry at

the (i+ 1)th component and 0-entries at other components. The following has been known (Du et al.2006).

Lemma 1 A f × n q-nary matrix is (d, 1; z)-disjunct if and only if it can be

trans-formed into a f q× n (d; z)-disjunct binary matrix.

Consider a finite field GF (q) of order q. Suppose k satisfies

(3)

J Comb Optim (2008) 15: 123–126 125 and

f = d(k − 1) + z ≤ q. (2)

Let M(d, n, q, k) be a f× n q-nary matrix of rows indexed with elements in GF (q) and columns indexed with polynomials of degree k− 1 over the finite field GF (q); the cell (x, g) contains element g(x) of GF (q).

Theorem 2 M(d, n, q, k) is a (d, 1; z)-disjunct q-nary matrix.

In fact, for contradiction, suppose M(d, n, q, k) is not (d, 1; z)-disjunct. Then there are d+ 1 columns g0, g1, . . . , gd such that g0(x)∈ {g1(x), . . . , gd(x)} for at

least f − z + 1 (= d(k − 1) + 1) x’s. Thus, there exists j, 1 ≤ j ≤ d, such that g0(x)= gj(x)for at lest k x’s. This implies g0= gj, a contradiction.

By (1) and (2), k and q should be chosen to satisfy logqn≤ k ≤

q− 1

d + 1. (3)

There exists a positive integer k satisfying (3) if q satisfies logqn≤

q− z

d . (4)

That is, it is sufficient to choose q satisfying

nd≤ qq−z. (5)

Let q0be the smallest number q satisfying (5). It is not hard to obtain the following estimation of q0. Lemma 3 q0= z + (1 + o(1)) dlog2n log2(dlog2n) . Moreover, q0≤ z + 2d log2n log2(dlog2n) for nd≥ 24.

Now, we present an improvement with the following multiplication theorem. Theorem 4 If there exist a q-nary (d, 1; z)-disjunct f × n matrix M1and a q-nary (d,1; z)-disjunct f× q matrix M2, then there exists a q-nary (d, 1; zz)-disjunct

ff× n matrix M3.

Proof M3 can be constructed from M1 and M2 by labeling columns of M2 with 0, 1, . . . , q− 1 and replacing each entry of M1by a corresponding column of M2.

(4)

126 J Comb Optim (2008) 15: 123–126 Consider d+1 columns C0, C1, . . . , Cdof M3. They are obtained from d+1 columns

C₀, C₁, . . . , C_d of M1, respectively. Since M1is (d, 1; z)-disjunct, there exist z rows of M1 at each of which the entry a0 of C₀ is different from entries a1, . . . , ad of

C₁, . . . , C_d. Suppose C₀, . . . , C_dare columns of M2, corresponding a0, . . . , ad,

re-spectively. Then, C₀is different from C₁, . . . , C_d. Since M2is (d, 1; z)-disjunct, C0 has zzrows at each of which the entry of C₀is different from entries of C₁, . . . , C_d. Therefore, M3has at least zzrows at each of which the entry of C0is different from

entries of C1, . . . , Cd.

If fq< q, then f q < ffq, that is, M3gives a transversal design with less num-ber of tests than M1does. This multiplication theorem gives a trade-off between the number of tests and the number of families. The former is reduced through increasing the latter.

Let us consider case z= z= 1. By Theorem2and Lemma3, we know that there exist a (d, 1; 1)-disjunct q-nary f × n matrix M1such that

f= (1 + o(1)) dlog2n)

log2(dlog2n) , and a (d, 1; 1)-disjunct q-nary f× q matrix M2such that

f= (1 + o(1)) dlog2q)

log2(dlog2q) ,

where q and qare prime powers satisfying f≤ q ≤ 2f and f≤ q≤ 2f. Note that dlog2n)

log2(dlog2n)

is increasing with respect to n and q < d log n for sufficiently large n. Therefore, for sufficiently large n,

ff= (1 + o(1)) d

2_log 2n log2(dlog2(dlog2n))

.

As we apply multiplication theorem more times, the oder of d will increase. There-fore, it is not clear that this method would lead to a construction to approach to the lower bound for the number of tests in a nonadaptive group testing.

References

Du D-Z, Hwang FK, Wu W, Znati T (2006) A new construction of transversal designs. J Comput Biol 13:990–995

D’ychkov AG, Macula AJ, Torney DC, Vilenkin PA (2001) Two models of nonadaptive group testing for designing screening experiments. In: Proceedings of the 6th International workshop on model-oriented designs and analysis, pp 63–75

Farach M, Kannan S, Knill E, Muthukrishnan S (1997) Group testing problem with sequences in ex-perimental molecular biology. In: Proceedings of the compression and complexity of sequences, pp 357–367