Fast parallel molecular solution to the dominating-set problem on massively parallel bio-computing

(1)

Fast parallel molecular solution to the

dominating-set problem on massively

parallel bio-computing

Minyi Guo

a,*

, Michael (Shan-Hui) Ho

b,1

,

Weng-Long Chang

b,1

a_{Department of Computer Software, The University of Aizu, Aizu-Wakamatsu City,}

Fukushima 965-8580, Japan

b_{Department of Information Management, Southern Taiwan University of Technology,}

Tainan County, 710, R.O.C. Taiwan

Received 5 April 2004; revised 1 July 2004; accepted 15 July 2004 Available online 25 September 2004

Abstract

This paper shows how to use DNA strands to construct solution space of molecules for the dominating-set problem and how to apply biological operations to solve the problem from the solution space of molecules. In order to achieve this, we have proposed some DNA based par-allel algorithms using the operations in Adleman–Lipton model, together with the analysis of the computational complexity for DNA parallel algorithms.

Keywords: Parallel biological computing; DNA-based computing; NP-complete problem; Dominating-set problem; DNA based parallel algorithms

*

Corresponding author. Tel.: 0242 37 2557; fax: 0242 37 2744.

E-mail addresses:minyi@u-aizu.ac.jp(M. Guo),michael@mail.stut.edu.tw(Michael (Shan-Hui) Ho),

changwl@mail.stut.edu.tw(W.-L. Chang).

1_{Tel.: +886 6 2533131x4300; fax: +886 6 2541621.}

(2)

1. Introduction

Through advances in molecular biology[1,2], it is now possible to produce 1018or more DNA strands in tube. Those 1018or more DNA strands can also be applied for representing 1018or more bits of information. Biological operations can be used to simultaneously operate 1018or more bits of information. Or we can say that 1018or more data processors can be executed in parallel. Hence, it becomes obvious that biological computing can provide a very huge parallelism for dealing with problems in the real world. Especially, the problems from the NP-complete class are well-known to be exponentially more diﬃcult than evaluating determinants whose entries are merely numerical. It is diﬃcult to solve these kinds of problems even if very mas-sive supercomputers are used when the problem sizes become large.

On the other hand, DNA computers have the full potential of the high perform-ance computing technology. One test tube can be viewed as a processing unit like standard computer architecture. Furthermore, DNA algorithms using biological operations have natural parallelism because DNA strands are separated (melted, an-nealed) in test tubes in parallel.

Feynman[29]first proposed molecular computation in 1961, but his idea was not implemented by experiment for a few decades. In 1994 Adleman [2]succeeded to solve an instance of the Hamiltonian path problem in a test tube, just by handling DNA strands. Lipton[3]demonstrated that the Adleman techniques could be used to solve the satisfiability problem (the first NP-complete problem). Adleman and coworkers[14]proposed sticker for enhancing the Adleman–Lipton model.

In this paper, we use sticker to construct a solution space of DNA for the domi-nating-set problem. Simultaneously, we also apply DNA operations in the Adleman– Lipton model to develop a DNA algorithm. The results of the proposed algorithm show that the dominating-set problem is resolved with biological operations in the Adleman–Lipton model for solution space of sticker. Furthermore, this work pre-sents clear evidence of the ability of DNA based computing to solve NP-complete problems.

The paper is organized as follows. Section 2 introduces the Adleman–Lipton model in detail then this model is compared with other models. Section 3 introduces a DNA algorithm for solving the dominating-set problem for the solution space of sticker. In Section 4, the experimental results of simulated DNA computing are given. Conclusions and future researching works are drawn in Section 5.

2. DNA model of computation 2.1. The Adleman–Lipton model

It is cited from[16]that DNA (DeoxyriboNucleic Acid) is the molecule that plays the main role in DNA based computing. In the biochemical world of large and small molecules, polymers, and monomers, DNA is a polymer, which is strung together from monomers called deoxyriboNucleotides. The monomers used for the

(3)

construc-tion of DNA are deoxyribonucleotides, where each deoxyribonucleotide contains three components: sugar, phosphate group, and nitrogenous base. Sugar has ﬁve car-bon atoms—for the sake of reference there is a ﬁxed numbering to them. The base also has carbons, so to avoid confusion the carbons of the sugar are numbered 10 to 50 _{(rather than 1–5). The phosphate group is attached to the 5}0 _{carbon, and the} base is attached to the 10_{carbon. Within the sugar structure there is a hydroxyl group} attached to the 30_carbon.

Due to[1,16], distinct nucleotides are detected only from their bases, which come in two types: purines and pyrimidines. Purines include adenineand guanine, abbrevi-ated A and G. Pyrimidines contain cytosine and thymine, abbreviabbrevi-ated C and T. Since nucleotides are only distinguished from their bases, they are simply represented as A, G, C, or T, depending upon the type of base they have. The structure of a nucleotide, cited in[16], is simpliﬁed inFig. 1. In this ﬁgure, B is one of the four possible bases (A, G, C, or T), P is the phosphate group, and the rest (of the ‘‘stick’’) is the sugar base (with its carbons enumerated 10 _{through 5}0_).

It was indicated from[1,11,16]that nucleotides could link together in two ways. Firstly the 50_{-phosphate group of one nucleotide is joined with the 3}0_{-hydroxyl group} of another forming a phosphodiester bond. The resulting molecule has the 50 -phos-phate group of one nucleotide, denoted as 50_{end, and the 3}0_{-OH group of the other} nucleotide is available for bonding, denoted as 30_{end. This gives the molecule} direc-tion, and we can talk about the direction of 50_{end to the 3}0_{end or 3}0_{end to the 5}0 end. The second way is that the base of one nucleotide interacts with the base of an-other to form a hydrogen bond. This bonding is based on pairing: A and T can pair together, and C and G can pair together—no other pairings are possible. This pair-ing principle is called the Watson–Crick complementarity (named after J.D. Watson and F.H.C. Crick who deduced the famous double helix structure of DNA in 1953, and won the Nobel Prize for its discovery).

According to[1,11,16], a DNA strand is essentially a sequence (polymer) of four types of nucleotides detected by one of the bases they contain. Two single strands of DNA under appropriate conditions can form a double strand, if the respective bases are the Watson–Crick complements of each other—A matches T, and C matches G; also the 30_{end matches the 5}0_{end. The length of a single stranded DNA is the} num-ber of nucleotides comprising a single strand. Thus, if a single stranded DNA in-cludes 20 nucleotides, we can say that it is a 20 mer (it is a polymer containing 20 monomers). The length of a double stranded DNA (where each nucleotide is base paired) is counted in the number of base pairs. Thus if we make a double stranded DNA from a single stranded 20 mer, then the length of the double stranded DNA is 20 base pairs, also written 20 bp. (For more discussion of the relevant biological background refer to[1,11,16].)

(4)

In the Adleman–Lipton model[2,3], splints were used to construct the correspond-ing edges of a particular graph of paths, which represented all possible binary num-bers. As it stands, their construction indiscriminately builds all splints that lead to a complete graph. This is to say that hybridization has a higher probability of errors. Hence, Adleman and coworkers[14]proposed the sticker-based model, which was an abstract of molecular computing based on DNA with a random access memory as well as a new form of encoding the information

The DNA operations in the Adleman–Lipton model[2,3,11,12]are described be-low. These operations will be used for ﬁguring out solutions of the dominating-set problem.

The Adleman–Lipton model:

A (test) tube is a set of molecules of DNA (i.e. a multi-set of ﬁnite strings over the alphabet {A, C, G, T}). Given a tube, one can perform the following operations: 1. Extract. Given a tube P and a short single strand of DNA, S, produces two tubes

+(P, S) and (P, S), where +(P, S ) is all of the molecules of DNA in P which contain the strand S as a sub-strand and (P, S) is all of the molecules of DNA in P which do not contain the short strand S.

2. Merge. Given tubes P1 and P2 yield [(P1, P2), where [(P1, P2) = P1[ P2. This operation is to pour two tubes into one, with no change in the individual strands. 3. Detect. Given a tube P, we have ÔyesÕ if P includes at least one DNA molecule, and

we have ÔnoÕ if it contains none.

4. Discard. Given a tube P, the operation will discard the tube P.

5. Read. Given a tube P, the operation is used to describe a single molecule, which is contained in the tube P. Even if P contains many diﬀerent molecules each encod-ing a diﬀerent set of bases, the operation can give an explicit description of exactly one of them.

2.2. Comparison of the Adleman–Lipton model with other models

Techniques in the Adleman–Lipton model could be used to solve the NP-com-plete Hamiltonian path problem and satisﬁability (SAT) problem by linearly increas-ing time and exponentially increasincreas-ing volumes of DNA [2,3]. Quyang et al. [4]

showed that restriction enzymes could be used to solve the NP-complete clique prob-lem (MCP). The maximum number of vertices that can be processed is limited to 27 because the size of the pool with the size of the problem exponentially increases[4]. Arito et al. [5] described new molecular experimental techniques for searching a Hamiltonian path. Morimoto et al. [6] oﬀered a solid-phase method to ﬁnding a

Hamiltonian path. Narayanan and Zorbala [7] proved that the Adleman–Lipton

model was extended towards solving the traveling salesman problem. Shin et al.

[8] presented an encoding scheme that applies ﬁxed-length codes for representing integer and real values. Their method could also be employed towards solving the traveling salesman problem. Amos[13]proposed a parallel ﬁltering model for resolv-ing the Hamiltonian path problem, the sub-graph isomorphism problem, the

(5)

3-ver-tex-colorability problem, the clique problem and the independent-set problem. In

our previous work, Chang and Guo [17–20,25] proved how the DNA operations

for solution space of splint in the Adleman–Lipton model could be employed for developing DNA algorithms to resolve the dominating-set problem, the vertex cover problem, the clique problem, the independent-set problem, the three-dimensional matching problem, the set-packing problem, the set cover problem and the problem of exact cover by 3-sets.

Roweis et al.[14]proposed sticker-based model to enhance the Adleman–Lipton model. Their model could be used for determining solutions to the set cover problem. Perez-Jimenez et al. [15] employed sticker-based model [14] to resolve knapsack problems. Fu [21]proposed new algorithms to resolve 3-SAT, 3-Coloring and the independent set. In our previous work, Chang et al.[26–28]also employed the stick-er-based model and the Adleman–Lipton model for dealing with the subset-sum problem, CookÕs theorem[9,10]and the set-splitting problem for decreasing the error rate of hybridization.

3. Using sticker for solving the dominating-set problem in the Adleman–Lipton model 3.1. Deﬁnition of the dominating-set problem

Mathematically, a dominating set of a graph G = (V, E), where V is the set of the vertex and E is the set of the edge, is a subset V1 V of vertices such that for all u2 V V1

there is a v2 V1

for which (u, v)2 E[9,10]. The dominating-set problem is to ﬁnd a minimum size dominating set in G. This has been proved to be a NP-com-plete problem[10].

The dominating-set problem asks: Given a network consisting of n vertices and m edges, how many vertices are in a minimum size dominating set? The graph includes three vertices and two edges as shown inFig. 1, where each circle in the ﬁgure rep-resents a vertex and the arc connecting two circles reprep-resents an edge. The minimum size dominating set for the graph inFig. 2is {v1}. Hence, the size of the dominating-set problem in this graph is one. It is indicated from[10]that ﬁnding a minimum-size dominating-set is a NP-complete problem, so it can be formulated as a ‘‘search’’ problem. The dominating set problem is widely used in network routing, town

(6)

simulation for Step 4(a) generated a resulted value, yes, since the tube T1 is not empty. Therefore Step 4(b) of simulation, the minimum-size dominating set from the tube T1was shown inTable 10.

5. Conclusions and future work

The present method for solving the dominating-set problem is based on biological operations in the Adleman–Lipton model and the solution space of stickers in the sticker-based model and thus is similar to the proposed method based on the solu-tion space of splints to solving the same problem [17]. The proposed algorithm has three advantages from the Adleman–Lipton model and the sticker-based model. First, the proposed algorithm actually has a lower rate of errors for hybridization because we modiﬁed the Adleman program to generate good DNA sequences for constructing the solution space of stickers to the dominating-set problem. Only sim-ple and fast biological operations in the Adleman–Lipton model were employed to solve the problem. Secondly, those biological operations in the Adleman–Lipton model had been performed in a fully automated manner in their lab. The full auto-mation manner is essential not only for the speedup of computation but also for error-free computation. Thirdly, in the proposed algorithm the number of tubes, the longest length of DNA library strands and the number of DNA library strands, respectively, are O(n), O(15· n) and O(2n

) strands. This implies that the proposed algorithm can be easily performed in a fully automated manner in a lab. Further-more, the present algorithm generates 2nlibrary strands, which satisﬁes the seven constraints in Section 3.2, which corresponds to 2npossible dominating sets. This al-lows the present algorithm to be applied to a larger instance of the dominating-set problem.

Currently, there are lots of NP-complete problems that cannot be solved because it is very diﬃcult to support basic biological operations using mathematical opera-tions. We are not sure whether molecular computing can be applied to dealing with every NP-complete problem. Therefore, in the future, our main work is to solve other NP-complete problems that were unresolved with the Adleman–Lipton model and the sticker model.

References

[1] R.R. Sinden, DNA Structure and Function, Academic Press, New York, 1994.

[2] L. Adleman, Molecular computation of solutions to combinatorial problems, Science 266 (Novem-ber) (1994) 1021–1024.

[3] R.J. Lipton, DNA solution of hard computational problems, Science 268 (1995) 542–545.

[4] Q. Quyang, P.D. Kaplan, S. Liu, A. Libchaber, DNA solution of the maximal clique problem, Science 278 (1997) 446–449.

[5] M. Arita, A. Suyama, M. Hagiya, A heuristic approach for Hamiltonian path problem with molecules, in: Proceedings of 2nd Genetic Programming (GP-97), 1997, pp. 457–462.

(7)

[6] N. Morimoto, M. Arita, A. Suyama, Solid phase DNA solution to the Hamiltonian path problem, DIMACS (Series in Discrete Mathematics and Theoretical Computer Science) 48 (1999) 93–206. [7] A. Narayanan, S. Zorbala, DNA algorithms for computing shortest paths, in: J.R. Koza et al. (Eds.),

Genetic Programming 1998: Proceedings of the Third Annual Conference, 1998, pp. 718–724. [8] S.-Y. Shin, B.-T. Zhang, S.-S. Jun, Solving traveling salesman problems using molecular

program-ming, in: Proceedings of the 1999 Congress on Evolutionary Computation (CEC99), vol. 2, 1999, pp. 994-1000.

[9] T.H. Cormen, C.E. Leiserson, R.L. Rivest. Introduction to algorithms, MIT Press, Cambridge, UK, ISBN 0-262-03141-8.

[10] M.R. Garey, D.S. Johnson, Computer and intractability, Freeman, San Fransico, CA, 1979. [11] D. Boneh, C. Dunworth, R.J. Lipton, J. Sgall, On the computational power of DNA, in: Discrete

Applied Mathematics, Special Issue on Computational Molecular Biology, vol. 71, 1996, pp. 79–94. [12] L.M. Adleman, On constructing a molecular computer, DNA Based Computers, in: R. Lipton, E. Baum (Eds.), DIMACS series in Discrete Mathematics and Theoretical Computer Science, American Mathematical Society, 1996, pp. 1–21.

[13] M. Amos, DNA Computation, Ph.D. Thesis, Department of Computer Science, The University of Warwick, 1997.

[14] S. Roweis, E. Winfree, R. Burgoyne, N.V. Chelyapov, M.F. Goodman, P.W.K. Rothemund, L.M. Adleman, A sticker based model for DNA computation, in: L. Landweber, E. Baum (Eds.), 2nd Annual Workshop on DNA Computing, DIMACS: Series in Discrete Mathematics and Theoretical Computer Science, American Mathematical Society, Princeton University, 1999, pp. 1–29. [15] M.J. Perez-Jimenez, F. Sancho-Caparrini, Solving knapsack problems in a sticker based model, in:

7th Annual Workshop on DNA Computing, DIMACS: Series in Discrete Mathematics and Theoretical Computer Science, American Mathematical Society, 2001.

[16] G. Paun, G. Rozenberg, A. Salomaa, DNA Computing: New Computing Paradigms, Springer-Verlag, New York, 1998, ISBN: 3-540-64196-3.

[17] W.-L. Chang, M. Guo, Solving the dominating-set problem in Adleman–LiptonÕs Model, in: The Third International Conference on Parallel and Distributed Computing, Applications and Technol-ogies, Japan, 2002, pp. 167–172.

[18] W.-L. Chang, M. Guo, Solving the clique problem and the vertex cover problem in Adleman–LiptonÕs model, in: IASTED International Conference, Networks, Parallel and Distributed Processing, and Applications, Japan, 2002, pp. 431–436.

[19] W.-L. Chang, M. Guo, Solving NP-complete problem in the Adleman–Lipton Model, in: The Proceedings of 2002 International Conference on Computer and Information Technology, Japan, 2002, pp. 157–162.

[20] W.-L. Chang, M. Guo, Resolving the 3-dimensional matching problem and the set packing problem in Adleman–LiptonÕs model, in: IASTED International Conference, Networks, Parallel and Distributed Processing, and Applications, Japan, 2002, pp. 455–460.

[21] B. Fu, Volume Bounded Molecular Computation, Ph.D. Thesis, Department of Computer Science, Yale University, 1997.

[22] R.S. Braich, C. Johnson, P.W.K. Rothemund, D. Hwang, N. Chelyapov, L.M. Adleman. Solution of a satisﬁability problem on a gel-based DNA computer, in: Proceedings of the 6th International Conference on DNA Computation in the Springer-Verlag Lecture Notes in Computer Science series. [23] K. Mir, A restricted genetic alphabet for DNA computing, in: E.B. Baum, L.F. Landweber (Eds.), DNA Based Computers II: DIMACS Workshop, June 10–2, 1996, volume 44 of DIMACS: Series in Discrete Mathematics and Theoretical Computer Science, Providence, RI, 1998, pp. 243–246. [24] A.R. Cukras, D. Faulhammer, R.J. Lipton, L.F. Landweber, Chess games: a model for RNA-based

computation, in: Proceedings of the 4th DIMACS Meeting on DNA Based Computers, Held at the University of Pennsylvania, June 16–19, 1998, pp. 27–37.

[25] W.-L. Chang, M. Guo, Solving the set cover problem and the problem of exact cover by 3-sets in the AdlemanLipton model, Biosystems 72 (3) (2003) 263–275.

[26] W.-L. Chang, M.(S.-H.) Ho, M. Guo, Molecular solutions for the subset-sum problem on DNA-based supercomputing, Biosystems 73 (2) (2004) 117–130.

(8)

[27] M. Ho, W.-L. Chang, M. Guo, Is CookÕs theorem correct for DNA-based computing—towards solving the NP-complete problems on a DNA-based supercomputer model, Journal of Parallel and Distributed Scientiﬁc and Engineering Computing, in press.

[28] W.-L. Chang, M. Guo, M. Ho, Solving the set-splitting problem in sticker-based model and the Adleman–Lipton model. Future Generation Computer System, in press.

[29] R.P. Feynman, in: D.H. Gilbert (Ed.), Minaturization, Reinhold Publishing Corporation, New York, 1961, pp. 282–296.