Is optimal solution of every NP-complete or NP-hard problem determined from its characteristic for DNA-based computing

(1)

Is optimal solution of every NP-complete or NP-hard problem

determined from its characteristic for DNA-based computing

夽

Minyi Guo

a,b,∗

, Weng-Long Chang

c

, Machael Ho

c

, Jian Lu

b

, Jiannong Cao

d a_{Department of Computer Software, The University of Aizu, Aizu-Wakamatsu City, Fukushima 965-8580, Japan}

b_{State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, PR China} c_{Department of Information Management, Southern Taiwan University of Technology, Tainan County, Taiwan}

d_{Department of Computing, Hong Kong Polytechnic University, Hong Kong} Received 1 July 2004; received in revised form 15 October 2004; accepted 16 October 2004

Abstract

Cook’s Theorem [Cormen, T.H., Leiserson, C.E., Rivest, R.L., 2001. Introduction to Algorithms, second ed., The MIT Press; Garey, M.R., Johnson, D.S., 1979. Computer and Intractability, Freeman, San Fransico, CA] is that if one algorithm for an NP-complete or an NP-hard problem will be developed, then other problems will be solved by means of reduction to that problem. Cook’s Theorem has been demonstrated to be correct in a general digital electronic computer. In this paper, we first propose a DNA algorithm for solving the vertex-cover problem. Then, we demonstrate that if the size of a reduced NP-complete or NP-hard problem is equal to or less than that of the vertex-cover problem, then the proposed algorithm can be directly used for solving the reduced NP-complete or NP-hard problem and Cook’s Theorem is correct on DNA-based computing. Otherwise, a new DNA algorithm for optimal solution of a reduced NP-complete problem or a reduced NP-hard problem should be developed from the characteristic of NP-complete problems or NP-hard problems.

Keywords: Molecular computing; DNA-based computing; NP-complete problems; NP-hard problems; Vertex-cover problem; Cook’s Theorem.

1. Introduction

Adleman wrote the first paper in which it was demonstrated that deoxyribonucleic acid (DNA) strands could be applied for figuring out solutions to

夽_{This work is supported in part by China National 973 Program}

un-der Grant 2002CB312002 and China NSFC (60273034, 60233010).

∗_{Corresponding author. Tel.: +81 2423 72557;}

fax: +81 2423 72744.

E-mail address: [email protected] (M. Guo).

an instance of the NP-complete Hamiltonian path prob-lem (HPP) (Adleman, 1994). Lipton wrote the second paper in which it was shown that the Adleman tech-niques could also be used to solve the NP-complete satisfiability (SAT) problem (the first NP-complete problem) (Lipton, 1995). Adleman and co-workers proposed sticker for enhancing the Adleman–Lipton model (Roweis et al., 1999).

In this paper, we use sticker to construct solution space of DNA library sequences for the vertex-cover problem. Simultaneously, we also apply DNA opera-0303-2647/$ – see front matter © 2004 Elsevier Ireland Ltd. All rights reserved.

(2)

tions in the Adleman–Lipton model to develop a DNA algorithm. The main result of the proposed DNA algo-rithm shows that the vertex-cover problem is solved with biological operations in the Adleman–Lipton model from the solution space of stickers. Furthermore, if the size of a reduced complete or a reduced NP-hard problem is equal to or less than that of the vertex-cover problem, then the proposed algorithm can be di-rectly used for solving the reduced NP-complete, or NP-hard problem.

The rest of this paper is organized as follows. In Sec-tion 2, the Adleman–Lipton model is introduced and the comparison is made with other models. In Section 3, the first DNA algorithm is proposed for solving the vertex-cover problem from solution space of sticker in the Adleman–Lipton model. In Section 4, the ex-perimental result of simulated DNA computing is also given. Conclusions are drawn in Section5.

2. DNA model of computation

In Section 2.1, a summary of DNA structure and the Adleman–Lipton model is described in detail. In Section 2.2, the comparison of the Adleman–Lipton model with other models is also introduced.

2.1. The Adleman–Lipton model

A DNA is a molecule that plays the main role in DNA based computing (Paun et al., 1998). In the bio-chemical world of large and small molecules, polymers and monomers, DNA is a polymer, which is strung together from monomers called deoxyribonucleotides. The monomers used for the construction of DNA are deoxyribonucleotides which each deoxyribonucleotide containing three components: a sugar, a phosphate group and a nitrogenous base. This sugar has five car-bon atoms—for the sake of reference there is a fixed numbering of them. Because the base also has carbons, to avoid confusion the carbons of the sugar are num-bered from 1to 5(rather than from 1 to 5). The phos-phate group is attached to the 5carbon, and the base is attached to the 1carbon. Within the sugar structure there is a hydroxyl group attached to the 3carbon.

Distinct nucleotides are detected only with their bases, which come in two sorts: purines and pyrim-idines (Sinden, 1994; Paun et al., 1998). Purines

in-Fig. 1. A schematic representation of a nucleotide.

clude adenine and guanine, abbreviated A and G. Pyrimidines contain cytosine and thymine, abbreviated C and T. Because nucleotides are only distinguished from their bases, they are simply represented as A, G, C, or T nucleotides, depending upon the sort of base that they have. The structure of a nucleotide is illus-trated (in a very simplified way) inFig. 1. InFig. 1, B is one of the four possible bases (A, G, C, or T), P is the phosphate group and the rest (the “stick”) is the sugar base (with its carbons enumerated 1through 5).

Nucleotides can link together in two different ways (Sinden, 1994; Boneh et al., 1996; Paun et al., 1998). The first method is that the 5-phosphate group of one nucleotide is joined with 3-hydroxyl group of the other forming a phos-phodiester bond. The resulting molecule has the 5-phosphate group of one nucleotide, denoted as 5end, and the 3-OH group of the other nu-cleotide available, denoted as 3end, for bonding. This gives the molecule the directionality, and we can talk about the direction of 5end to 3end or 3end to 5end. The second way is that the base of one nucleotide inter-acts with the base of the other to form a hydrogen bond. This bonding is the subject of the following restriction on the base pairing: A and T can pair together, and C and G can pair together—no other pairings are possi-ble. This pairing principle is called the Watson–Crick complementarity (named after James D. Watson and Francis H.C. Crick who deduced the famous double helix structure of DNA in 1953, and won the Nobel Prize for the discovery).

A DNA strand is essentially a sequence (polymer) of four types of nucleotides detected by one of four bases they contain. Two strands of DNA can form (un-der appropriate conditions) a double strand, if the re-spective bases are the Watson–Crick complements of each other—A matches T and C matches G; also 3 end matches 5 end. The length of a single stranded DNA is the number of nucleotides comprising the sin-gle strand. Thus, if a sinsin-gle stranded DNA includes 20 nucleotides, then we say that it is a 20 mer (it is a poly-mer containing 20 monopoly-mers). The length of a double stranded DNA (where each nucleotide is base paired) is

(3)

counted in the number of base pairs. Thus, if we make a double stranded DNA from a single stranded 20 mer, then the length of the double stranded DNA is 20 base pairs, also written 20 bp (for more discussion of the relevant biological background refer toSinden, 1994; Boneh et al., 1996; Paun et al., 1998).

In the Adleman–Lipton model (Adleman, 1994; Lipton, 1995), splints were used to correspond to the edges of a particular graph the paths of which repre-sented all possible binary numbers. A s it stands, their construction indiscriminately builds all splints that lead to a complete graph. This is to say that hybridization has higher probabilities of errors. Hence, Adleman et al. (Roweis et al., 1999) proposed the sticker-based model, which was an abstract model of molecular computing based on DNAs with a random access memory and a new form of encoding the information, to enhance the Adleman–Lipton model.

The DNA operations in the Adleman–Lipton model are described below (Adleman, 1994; Lipton, 1995; Boneh et al., 1996; Adleman, 1996). These operations will be used for figuring out solutions of the vertex-cover problem.

The Adleman–Lipton model

A (test) tube is a set of molecules of DNA (i.e., a multi-set of finite strings over the alphabet {A, C, G, T}). Given a tube, one can perform the following operations:

(1) Extract. Given a tube P and a short single strand of DNA, S, produce two tubes +(P, S) and−(P, S), where +(P, S) is all of the molecules of DNA in P, which contain the strand S as a sub-strand and

−(P, S) is all of the molecules of DNA in P, which

do not contain the short strand S.

(2) Merge. Given tubes P1 and P2, yield ∪(P1, P2), where∪(P1, P2) = P1∪P2. This operation is to pour two tubes into one, with no change of the individual strands.

(3) Detect. Given a tube P, say ‘yes’ if P includes at least one DNA molecule, and say ‘no’ if it contains none.

(4) Discard. Given a tube P, the operation will discard the tube P.

(5) Read. Given a tube P, the operation is used to describe a single molecule, which is contained

in the tube P. Even if P contains many different molecules each encoding a different set of bases, the operation can give an explicit description of exactly one of them.

2.2. Other related work and comparison with the Adleman–Lipton model

Based on solution space of splint in the Adleman–Lipton model, their methods (Narayanan and Zorbala, 1998; Chang and Guo, 2002a, 2002b, 2002c, 2002d) could be applied towards solving the traveling salesman problem, the dominating-set prob-lem, the vertex-cover probprob-lem, the clique probprob-lem, the independent-set problem, the 3-dimensional matching problem and the set-packing problem. Those methods for the problems show exponentially increasing vol-umes of DNA and linearly increasing time.LaBean et al. (2000)proposed an n1.89nvolume, O(n2+ m2) time molecular algorithm for the 3-coloring problem and a 1.51n volume, O(n2m2) time molecular algorithm for the independent set problem, where n and m are, sub-sequently, the number of vertices and the number of edges in the problems resolved.Fu (1997)presented a polynomial-time algorithm with a 1.497n volume for the 3-SAT problem, a polynomial-time algorithm with a 1.345n volume for the 3-coloring problem and a polynomial-time algorithm with a 1.229nvolume for the independent set. Though the size of those volumes (Fu, 1997; LaBean et al., 2000) is lower, constructing those volumes is more difficult and the time complexity to the methods is very higher.

Quyang et al. (1997) showed that restriction en-zymes could be used to solve the NP-complete clique problem. The maximum number of vertices that they can process is limited to 27 because the size of the pool with the size of the problem exponentially increases (Quyang et al., 1997). Shin et al. (1999) presented an encoding scheme for decreasing error rate in hy-bridization. The method (Shin et al., 1999) could be employed towards ascertaining the traveling salesman problem for representing integer and real values with fixed-length codes.Arita et al. (1997)andMorimoto et al. (1999), respectively, proposed new molecular experimental techniques and a solid-phase method to find a Hamiltonian path.Amos (1997)proposed paral-lel filtering model for resolving the Hamiltonian path problem, the sub-graph isomorphism problem, the

(4)

3-vertex-colorability problem, the clique problem and the independent-set problem. Those methods (Arita et al., 1997; Morimoto et al., 1999; Amos, 1997) have lower error rate in real molecular experiments.

In the literature (Reif et al., 2000; LaBean et al., 2000; LaBean and Reif, 2001), the methods for DNA-based computing by self-assembly require to use DNA nanostructures, called tiles, that have efficient chemistries, expressive computational power, and con-venient input and output (I/O) mechanisms. DNA tiles have very lower error rate in self-assembly.Garzon and Deaton (1999)introduced a review of the most impor-tant advances in molecular computing.

Adleman and co-workers (Roweis et al., 1999) pro-posed sticker-based model to enhance error rate in hybridization in the Adleman–Lipton model. Their model could be used for determining solutions to an instance of the set cover problem. Perez-Jimenez and Sancho-Caparrini (2001)employed sticker-based model (Roweis et al., 1999) to solve knapsack prob-lems. In our previous work,Chang and Guo (2004)and Chang et al. (2003) also employed the sticker-based model and the Adleman–Lipton model to deal with the dominating-set problem and the set-splitting problem for decreasing error rate of hybridization.

3. Using sticker for solving the vertex-cover problem in the Adleman–Lipton model

In Section 3.1, the vertex-cover problem is de-scribed. Applying sticker to constructing solution space of DNA sequences for the vertex-cover problem is in-troduced in Section3.2. In Section3.3, one DNA algo-rithm is proposed to resolve the vertex-cover problem. In Section 3.4, the complexity of the proposed algo-rithm is offered. In Section3.5, the range of application to famous Cook’s Theorem is described in molecular computing.

3.1. Deﬁnition of the vertex-cover problem

Assume that a graph G can be represented as G = (V, E), where V ={v1,. . ., vn} is a set of vertices in G and

E ={(va, vb)|vaand vbare, respectively, vertices in V} is a set of edges in G.|V| = n is the number of vertex in V and|E| = m is the number of edge in E.

Mathematically, a vertex cover of a graph G is a subset V1⊆ V of vertices such that for each edge (va, vb)

Fig. 2. The graph G of our problem.

in E, at lease one of vaand vbbelongs to V1(Cormen et

al., 2001; Garey and Johnson, 1979). The vertex-cover problem is to find a minimum-size vertex cover from G. The problem has been shown to be a NP-complete problem (Garey and Johnson, 1979).

The graph in Fig. 2 denotes such a problem. In Fig. 2, the graph G contains three vertices and two edges. The minimum-size vertex cover for G is{v1}. Hence, the size of the vertex-cover problem inFig. 2 is one. It is indicated from (Garey and Johnson, 1979) that finding a minimum-size vertex cover is an NP-complete problem, and it can be formulated as a search problem.

3.2. Using sticker for constructing solution space of DNA sequence for the vertex-cover problem

The first step in the Adleman–Lipton model is to yield solution space of DNA sequences for those prob-lems solved. Next, basic biological operations are used to remove illegal solution and find legal solution from solution space. Thus, the first step of solving the vertex-cover problem is to generate a test tube, which includes all of the possible vertex covers. Assume that an n-digit binary number corresponds to each possible ver-tex cover to any n-verver-tex graph, G. Also suppose that V1is a vertex cover for G. If the ith bit in an n-digit binary number is set to 1, then it represents that the corresponding vertex is in V1. If the ith bit in an n-digit binary number is set to 0, then it represents that the corresponding vertex is out of V1.

By this way, all of the possible vertex covers in G are transformed into an ensemble of all n-digit binary numbers. Hence, with the way above,Table 1denotes the solution space for the graph inFig. 2. The binary number 000 inTable 1represents that the correspond-ing vertex cover is empty. The binary numbers 001, 010 and 011 inTable 1represent that those corresponding vertex covers are{v1}, {v2} and {v2, v1}, respectively. The binary numbers 100, 101 and 110 inTable 1 rep-resent that those corresponding vertex covers,

(5)

subse-Table 8

DNA sequences generated by Step 3 represent those vertex covers including two vertices

5-ATTCTAACTCTACCTATTCACTTCTTTAATTTTCAATAA CACCTC-3 5-AACATACCCCTAATCTCTAATATAATTACTTTTCAATAAC ACCTC-3 5-AACATACCCCTAATCATTCACTTCTTTAATAAAACTCACC CTCCT-3 Table 9

DNA sequence generated by Step 3 represents that vertex cover con-taining three vertices

5-AACATACCCCTAATCATTCACTTCTTTAATTTTCAATAAC ACCTC-3

Table 10

DNA sequence generated by Step 4 represents the minimum-size vertex cover

5-ATTCTAACTCTACCTTCTAATATAATTACTTTTCAATAAC ACCTC-3

5. Conclusions

Cook’s Theorem is that if one algorithm for an NP-complete or an NP-hard problem will be developed, then other problems will be solved by means of re-duction to that problem. Cook’s Theorem has been demonstrated to be right in a general digit electronic computer. In this paper, we showed that, from Theo-rem 3.3 to 3.6, if the size of a reduced NP-complete problem is equal to or less than that of the vertex-cover problem, then Cook’s Theorem is right in molecular computing. Otherwise, a new DNA algorithm for op-timal solution of a reduced NP-complete problem or a reduced NP-hard problem should be developed from the characteristic of NP-complete problems or NP-hard problems.

Chang and Guo (2002b, 2002d)applied splints to constructing solution space of DNA sequence for solv-ing the vertex-cover problem in the Adleman–Lipton. This causes that hybridization has higher probabili-ties for errors. Adleman and co-workers (Roweis et al., 1999) proposed sticker to decrease probabilities of er-rors to hybridization in the Adleman–Lipton. The main result of the proposed algorithms shows that the vertex-cover problem is solved with biological operations in the Adleman–Lipton model from solution space of

sticker. Furthermore, this work represents clear evi-dence for the ability of DNA based computing to solve NP-complete problems.

Currently, there still are lots of NP-complete prob-lems not to be solved because it is very difficult to use basic biological operations for replacing mathematical operations. We are not sure whether molecular comput-ing can be applied for dealcomput-ing with every NP-complete problem. Therefore, in the future, our main work is to solve other unsolved NP-complete problems with the Adleman–Lipton model and the sticker model, or de-velop a new model.

References

Sinden, R.R., 1994. DNA Structure and Function. Academic Press. Adleman, L., 1994. Molecular computation of solutions to

combina-torial problems. Science 266, 1021–1024.

Lipton, R.J., 1995. DNA solution of hard computational problems. Science 268, 542–545.

Quyang, Q., Kaplan, P.D., Liu, S., Libchaber, A., 1997. DNA solution of the maximal clique problem. Science 278, 446–449. Arita, M., Suyama, A., Hagiya, M., 1997. A heuristic approach for

Hamiltonian path problem with molecules. In: Proceedings of Second Genetic Programming (GP-97), pp. 457–462. Morimoto, N., Arita, M., Suyama, A., 1999. Solid phase DNA

solu-tion to the Hamiltonian path problem. In: Series in Discrete Math-ematics and Theoretical Computer Science, vol. 48, pp. 93–206. Narayanan, A., Zorbala, S., 1998. DNA algorithms for computing shortest paths. In: Koza, J.R., et al. (Eds.), Genetic Programming 1998: Proceedings of the Third Annual Conference, pp. 718–724. Shin, S.-Y., Zhang, B.-T., Jun, S.-S., 1994. Solving traveling sales-man problems using molecular programming. In: Proceedings of the 1999 Congress on Evolutionary Computation (CEC99), vol. 2, pp. 994–1000.

Cormen, T.H., Leiserson, C.E., Rivest, R.L., 2001. Introduction to Algorithms, second ed. The MIT Press.

Garey, M.R., Johnson, D.S., 1979. Computer and Intractability. Free-man, San Fransico, CA.

Boneh, D., Dunworth, C., Lipton, R.J., Sgall, J., 1996. On the compu-tational power of DNA. In: Discrete Applied Mathematics, vol. 71, pp. 79–94 (special issue on computational molecular biol-ogy).

Adleman, L.M., 1996. On constructing a molecular computer. DNA based computers. In: Lipton, R., Baum, E. (Eds.), Series in Dis-crete Mathematics and Theoretical Computer Science American Mathematical Society, pp. 1–21.

Amos, M., 1997. DNA computation, Ph.D. Thesis, Department of Computer Science, The University of Warwick.

Roweis, S., Winfree, E., Burgoyne, R., Chelyapov, N.V., Goodman, M.F., Rothemund, P.W.K., Adleman, L.M., 1999. Sticker Based Model for DNA Computation, Princeton University, In: Landwe-ber, L., Baum, E. (Eds.), In: Proceedings of the second annual

(6)

workshop on DNA Computing, Series in Discrete Mathematics and Theoretical Computer Science, American Mathematical So-ciety, pp. 1–29.

Perez-Jimenez, M.J., Sancho-Caparrini, F., 2001. Solving Knapsack Problems in a Sticker Based Model. In: Proceedings of the Sev-enth Annual Workshop on DNA Computing, DIMACS: Series in Discrete Mathematics and Theoretical Computer Science, Amer-ican Mathematical Society.

Paun, G., Rozenberg, G., Salomaa, A., 1998. DNA Computing: New Computing Paradigms. Springer–Verlag, New York.

Chang, W.-L., Guo, M., 2002a. Solving the dominating-set problem in Adleman–Liptons Model. In: The Third International Confer-ence on Parallel and Distributed Computing, Applications and Technologies, Japan, pp. 167–172.

Chang, W.-L., Guo, M., 2002b. Solving the clique problem and the vertex-cover problem in Adleman–Lipton’s Model. In: IASTED International Conference on Networks, Parallel and Distributed Processing, and Applications, Japan, pp. 431–436.

Chang, W.-L., Guo, M., 2002c. Solving NP-complete problem in the Adleman–Lipton Model. In: The Proceedings of 2002 Inter-national Conference on Computer and Information Technology, Japan, pp. 157–162.

Chang, W.-L., Guo, M., 2002d. Solving the 3-dimensional match-ing problem and the set packmatch-ing problem in Adleman–Lipton’s Model. In: IASTED International Conference on Networks, Par-allel and Distributed Processing, and Applications, Japan, pp. 455–460.

Fu, B., 1997. Volume bounded molecular computation, Ph.D. Thesis, Department of Computer Science, Yale University.

Braich, R.S., Johnson, C., Rothemund, P.W.K., Hwang, D., Chelyapov, N., Adleman, L.M., 1999. Solution of a satisfiabil-ity problem on a gel-based DNA computer. In: Proceedings of

the sixth International Conference on DNA Computation in the Springer–Verlag Lecture Notes in Computer Science Series. Kalim, M., Restricted genetic alphabet for DNA computing, In: Eric,

B., Baum, Landweber, L.F., 1998. DNA Based Computers II: DI-MACS Workshop, June 10–12, 1996, volume 44 of DIDI-MACS: Series in Discrete Mathematics and Theoretical Computer Sci-ence, ProvidSci-ence, RI, pp. 243–246.

Cukras, A.R., Faulhammer, D., Lipton, R.J., Landweber, L.F., 1998. Chess games: a model for RNA-based computation. In: Proceed-ings of the fourth DIMACS Meeting on DNA Based Computers, University of Pennsylvania, pp. 27–37.

Chang, W.-L., Guo, M., 2004. Using sticker for solving the dominating-set problem in the Adleman–Lipton Model. IEICE Trans. Inf. Syst. E-87D (7), 1782–1788.

Reif, J.H., LaBean, T.H., Seeman, 2000. Challenges and applications for self-assembled DNA-nanostructures. In: Proceedings of the Sixth DIMACS Workshop on DNA Based Computers, Leiden, Holland.

LaBean, T.H., Winfree, E., Reif, J.H., 2000. Experimental progress in computation by self-assembly of DNA tilings. Theor. Comput. Sci. 54, 123–140.

LaBean, T.H., Reif, J.H., 2001. Logical computation using algorith-mic self-assembly of DNA triple-crossover molecules. Nature 407, 493–496.

Garzon, M.H., Deaton, R.J., 1999. Biomolecular Computing and Programming. IEEE Trans. Evolut. Comput. 3, 236–250. Chang, W.-L., Guo, M., Ho, M., 2003. Solving the set-splitting

prob-lem in sticker-based model and the Adprob-leman–Lipton Model. In: The 2003 International Symposium on Parallel and Distributed Processing and Applications, Aizu City, Japan, LNCS 2745, pp. 185–196.