Fast Parallel DNA-based Algorithms for Molecular Computation: the Set-Partition Problem

(1)

346 IEEE TRANSACTIONS ON NANOBIOSCIENCE, VOL. 6, NO. 4, DECEMBER 2007

Fast Parallel DNA-Based Algorithms for Molecular

Computation: The Set-Partition Problem

Weng-Long Chang

Abstract—This paper demonstrates that basic biological

opera-tions can be used to solve the set-partition problem. In order to achieve this, we propose three DNA-based algorithms, a signed par-allel adder, a signed parpar-allel subtractor and a signed parpar-allel com-parator, that formally verify our designed molecular solutions for solving the set-partition problem.

Index Terms—DNA-based computing, the NP-complete

prob-lems, the NP-hard problems.

I. INTRODUCTION

I

N 1994 Adleman [1] succeeded in solving an instance of the Hamiltonian path problem in a test tube by handling DNA strands. From [7], it was indicated that optimal solution of every NP-complete or NP-hard problem is determined from its characteristic. DNA-based algorithms had been offered to solve many computational problems, and these contained the set-splitting problem [8], the set-cover problem and the problem of exact cover by 3-sets [9], the dominating-set [10], the in-dependent-set problem [11], and the binary integer program-ming problem [12]. Potentially significant area of application for DNA algorithms is the breaking of encryption schemes [2], [3] and the constructing of DNA databases [13].

The paper is organized as follows: Section II introduces DNA models of computation proposed by Adleman and his coau-thors in detail. Section III introduces the DNA program to solve the set-partition problem from solution spaces of DNA strands. Conclusions are drawn in Section IV.

II. BACKGROUND

A. DNA Manipulations

Tube from [1]–[5] is a set of molecules of DNA (a multiset of finite strings over the alphabet A, C, G, T ). Given a tube, one can perform the following operations.

i. Extract. Given a tube P and a short single strand of DNA, S, the operation produces two tubes (P, S) and (P, S), where (P, S) is all of the molecules of DNA in P which contain as a substrand and (P, S) is all of the molecules of DNA in P which do not contain . ii. Merge. Given tubes P and , yield P , P ), where

(P , P ) = P P .

Manuscript received November 24, 2004; revised October 3, 2006. This work was supported in part by the R.O.C. National Science Council under Grant 95-2221-E-151-034-.

The author is with the Department of Computer Science and Information En-gineering, National Kaohsiung University of Applied Sciences, Kaohsiung 807, Taiwan, R.O.C. (e-mail: changwl@cc.kuas.edu.tw).

Digital Object Identifier 10.1109/TNB.2007.909012

iii. Detect. Given a tube P, if P includes at least one DNA molecule, then we have “yes.” Otherwise, we have “no.” iv. Discard. Given a tube P, the operation will discard P.

v. Amplify. Given a tube P, the operation Amplify(P P P ) will produce two new tubes P and P so that P and P are totally a copy of P (P and P are now identical) and P becomes an empty tube.

vi. Append. Given a tube P containing a short strand of DNA, Z, the operation will append Z onto the end of every strand in P.

vii. Append-head. Given a tube P containing a short strand of DNA, Z, the operation will append Z onto the head of every strand in P.

viii. Read. Given a tube P, the operation is used to describe a single molecule, which is contained in tube P.

III. MOLECULARSOLUTIONS OF THESET-PARTITIONPROBLEM

A. The Introduction of the Set-Partition Problem

Assume that a finite set is , where is the th element for . Also suppose that every element in is a positive integer. Assume that is the number of elements in and is equal to . The set-partition problem is to determine whether there is a subset such that

where and .

Suppose that a finite set is . Eight subsets, , of

are and .

The corresponding eight subsets, to are

and . The

sum for each pair ( is, subsequently,

and . So the solution of the set-partition problem for is and .

B. A Pseudoalgorithm for Solving the Set-Partition Problem

From definition of the set-partition problem in Section III-A, the form of an expression, , can be

trans-formed into another form .

The following pseudoalgorithm is used to solve the set-partition problem.

Method 1: Solving the set-partition problem.

(1) Every computation of for each

pair is simultaneously performed on a molecular computer.

(2) On a molecular computer, search the answer, , from the result generated by Step (1).

(2)

CHANG: FAST PARALLEL DNA-BASED ALGORITHMS FOR MOLECULAR COMPUTATION: THE SET-PARTITION PROBLEM 347

C. A Library for Solving the Set-Partition Problem

Assume that an -bit binary number, , is applied to represent elements in a finite set , where the value of each bit is either 1 or 0 for 1 . From [6], for every bit representing the th element in a finite set to 1 , two distinct 15 base value sequences are designed. For the sake of convenience in our presentation, assume that denotes the value of to be 1 and defines the value of to be 0. Each of the 2 different values encoding each pair ( was represented by a library sequence of bases including the concatenation of one value sequence for each bit. Library sequences are also termed library strands and a combi-natorial pool containing library strands is termed a library. The following algorithm is used to construct a library to solve the set-partition problem. Procedure Init (1) For to (1a) Amplify . (1b) Append-head . (1c) Append-head . (1d) . EndFor EndProcedure

Lemma 1: A library for solving the set-partition problem can

be constructed from the algorithm Init .

Proof: Each time Step (1a) is used to amplify tube and to generate two new tubes, and , which are copies of , and tube becomes empty. Then, on each execution of Step (1b), it is applied to append a DNA sequence, representing the value 1 for , onto the head of every strand in tube . This means that the th element in a finite set appears in tube and it is in a subset of but not in the corresponding subset

. Each time Step (1c) is also employed to append a DNA se-quence, representing the value 0 for , onto the head of every strand in tube . That implies that the th element in a finite set does not appear in tube and it is not in a subset of but in the corresponding subset . Next, on each execution of Step (1d), it is used to pour tube and into tube . This indicates that DNA strands in tube include DNA sequences of and . After repeating execution of Steps (1a) through (1d), it finally produces tube that consists of li-brary sequences encoding pairs ( .

D. Solution Space of the Value for Every Element of Each Subset for Solving the Set-Partition Problem of a Finite Set

The value of an element for 1 in an -ele-ment finite set can be represented as a signed binary number, . The bit is a signed bit, the value 0, for it is used to represent positive sign and the value 1 to it is employed to represent negative sign. The bit is the highest order bit and the bit is the lowest order bit. From [6], for every bit to 1 , two distinct DNA sequences

are designed. For the sake of convenience in our presentation, assume that denotes the value of to be 1 and de-fines the value of to be 0. The following algorithm is pro-posed to construct library sequences encoding the value of each element in every subset from tube , generated by the algo-rithm, Init . Procedure Value (1) For to (1a) and . (1b) For to (1c) Append-head . (1d) Append-head . EndFor (1e) Append-head . (1f) Append-head . (1g) . EndFor EndProcedure

Lemma 2: Library sequences encoding the value of each

el-ement in every subset to a finite set can be constructed from the algorithm Value .

Proof: Refer to Lemma 1.

E. A Library Sequence of an Initial Value to Computation of the Sum for Elements in Each Subset in a Finite Set

From definition of the set-partition problem denoted in Sec-tions III-A and III-B, it is indicated that adder and subtractor of times are used to perform computation of the sum for elements in each subset in an -element finite set . Assume that is used to represent the sum of elements in each subset in an -element finite set . Also suppose that the length of is bits and is represented as a -bit binary

number, , where the value of each bit

is either 1 or 0 for and . The

bit is a signed bit, the value 0, for it is used to represent positive sign and the value 1 to it is employed to represent negative sign. The bits and are employed to represent the most significant bit and the least significant bit for , re-spectively. If updating of the th time for is finished through an adder, then two binary numbers

and are used to represent the

augend and the sum of the th updating, respectively. If updating of the th time for is finished through a

sub-tractor, then two binary numbers and

are applied to represent the min-uend and the difference of the th updating, respectively. From [6], for every bit , two distinct 15 base value sequences are designed. For the sake of convenience in our presentation, assume that denotes the value of to be 1 and defines the value of to be 0. The following algorithm is

(3)

348 IEEE TRANSACTIONS ON NANOBIOSCIENCE, VOL. 6, NO. 4, DECEMBER 2007

used to construct a library sequence to encode an initial value to computation of the sum for elements in every pair ( to a finite set . Procedure InitialValue (1) For to (1a) Append-head . EndFor EndProcedure

Lemma 3: Library strands for initial values to computation

of the sum for elements in every pair ( to a finite set can be constructed from the algorithm, InitialValue . Proof: Similar to Lemma 1.

F. The Construction of a Parallel One-Bit Comparator

The following algorithm, OneBitComparator

, is presented to finish the function of a one-bit parallel comparator.

Procedure OneBitComparator( , (1) and . (2) and . (3) and . (4) and . (5) and . (6) and . (7) (8) . (9) . (10) . (11) . (12) . EndProcedure

Lemma 4: The algorithm OneBitComparator

can be applied to finish the function of a one-bit parallel comparator.

Proof: The execution for Steps (1) through (3) employs

the extract operations to form six test tubes. Tube includes library sequences that have , tube consists of library strands that have , tube includes library

sequences that have and , tube consists

of library strands that have and , tube includes library sequences that have and and tube consists of library strands that have

and . Next, the execution for Steps (4) through (6) applies also the extract operations to form six test tubes. Tube

includes library sequences that have , tube consists of library strands that have , tube includes library sequences that have and , tube consists of library strands that have and , tube includes library sequences that have and , and tube consists of library strands that have and . The execution to Steps (7) through (12) uses the merge operations to pour tubes and into tube , to pour tubes and into tube , to pour tube into tube , to pour tube into tube , to pour tube into tube and to pour tube into tube

.

From the algorithm OneBitComparator

, it takes six extract operations, six

merge operations, and 18 test tubes to perform the function of

a one-bit parallel comparator.

G. The Construction of a Signed Parallel Comparator

The following algorithm, ParallelComparator

is proposed to perform the function of a -bit signed parallel comparator.

Procedure ParallelComparator (1) and . (2) and . (3) and . (4) For to 1 (4a) OneBitComparator .

(4b) If ((Detect “no”) and (Detect “no”)) then

(4c) Terminate the execution of the loop. EndIf

EndFor

(5) .

(6) .

EndProcedure

Lemma 5: The algorithm ParallelComparator can be used to finish the function of a -bit signed parallel comparator.

Proof: The execution for Steps (1) through (3) employs

the extract operations to form six test tubes. Tube includes library sequences that have , tube consists of library strands that have , tube includes library

sequences that have and , tube

consists of library strands that have and , tube includes library sequences that have and , and tube consists of library strands that have and . The only loop is used to implement the function of a -bit signed parallel comparator. The

(4)

CHANG: FAST PARALLEL DNA-BASED ALGORITHMS FOR MOLECULAR COMPUTATION: THE SET-PARTITION PROBLEM 353

produces library sequences in tube that perform computation of the sum for 2 pairs of subsets, .

Next, Step (5) is a single loop and is mainly used to find the answer for the set-partition problem to a finite -element set . On each execution of Step (5a), it applies the extract operation to form two tubes: and . Tube contains library sequences that have and Tube includes library sequences that have . Next, each execution for Step (5b) uses the discard operation to discard tube . After repeating exe-cution of Steps (5a) through (5b), it finally produces library se-quences in tube that encode any answer for the set-partition problem to a finite -element set . Then, the execution for Step (6) employs the detect operation to check if tube contains any library sequence or not. If it returns a “yes,” then the execution for Step (6a) uses the read operation to read the solution for the set-partition problem to a finite -element set, . Therefore, any solution for the set-partition problem to a finite -element set can be computed from those steps in Algorithm 1.

O. The Complexity of Solving the Set-Partition Problem to a Finite N-element Set

Theorem 2: The set-partition problem for a finite -element set can be solved with ) biological operations, ) library sequences, tubes, and the longest library strand from solution space of library sequences, where the number of bits for the value of each element in is

bits.

Proof: Refer to Algorithm 1.

IV. CONCLUSION

In this paper, the first DNA algorithm of a signed parallel adder, the first DNA algorithm of a signed parallel subtractor, and the first DNA algorithm of a signed parallel comparator are proposed to perform the function of signed parallel addition, the function of signed parallel subtraction, and the function of signed parallel comparator. Currently the future of molecular computers is unclear. It is possible that in the future molecular computers will be the clear choice for performing massively parallel computations. However, there are still many technical difficulties to overcome before this becomes a reality. We hope

that this paper helps to demonstrate that molecular computing is a technology worth pursuing.

REFERENCES

[1] L. Adleman, “Molecular computation of solutions to combinatorial problems,” Science, vol. 266, pp. 1021–1024, 1994.

[2] W.-L. Chang, M. Guo, and M. Ho, “Fast parallel molecular algo-rithms for DNA-based computation: Factoring integers,” IEEE Trans.

Nanobiosci., vol. 4, no. 2, pp. 149–163, Jun. 2005.

[3] W.-L. Chang, M. Ho, and M. Guo, “Molecular solutions for the subset-sum problem on DNA-based supercomputing,” BioSystems, vol. 73, no. 2, pp. 117–130, 2004.

[4] W.-L. Chang and M. Guo, “Solving the set-cover problem and the problem of exact cover by 3-sets in the Adleman-Lipton model,”

BioSystems, vol. 72, no. 3, pp. 263–275, 2003.

[5] G. Paun, G. Rozenberg, and A. Salomaa, DNA Computing: New

Com-puting Paradigms. New York: Springer-Verlag, 1998.

[6] L. M. Adleman, R. S. Braich, C. Johnson, P. W. K. Rothemund, D. Hwang, and N. Chelyapov, “Solution of a 20-variable 3-SAT problem on a DNA computer,” Science, vol. 296, no. 5567, pp. 499–502, 2002. [7] M. Guo, W.-L. Chang, M. Ho, J. Lu, and J. Cao, “Is optimal solution of every NP-complete or NP-hard problem determined from its char-acteristic for DNA-based computing,” Biosystems, vol. 80, no. 1, pp. 71–82, 2005.

[8] W.-L. Chang, M. Guo, and M. Ho, “Towards solution of the set-split-ting problem on gel-based DNA compuset-split-ting,” Future Gener. Comput.

Syst., vol. 20, no. 5, pp. 875–885, Jun. 15, 2004.

[9] W.-L. Chang and M. Guo, “Solving the set-cover problem and the problem of exact cover by 3-sets in the Adleman-Lipton’s model,”

BioSystems, vol. 72, no. 3, pp. 263–275, 2003.

[10] W.-L. Chang, M. Ho, and M. Guo, “Fast parallel molecular solution to the dominating-set problem on massively parallel bio-computing,”

Parallel Comput., vol. 30, no. 9–10, pp. 1109–1125, 2004.

[11] W.-L. Chang, M. Guo, and J. Wu, “Solving the independent-set problem in a DNA-based supercomputer model,” Parallel Process.

Lett., vol. 15, no. 4, pp. 469–480, Dec. 2005.

[12] C.-W. Yeh, C.-P. Chu, and K.-R. Wu, “Molecular solutions to the binary integer programming problem based on DNA computation,”

Biosystems, vol. 83, no. 1, pp. 56–66, Jan. 2006.

[13] A. Schuster, “DNA databases,” BioSystems, vol. 81, pp. 234–246, 2005.

Weng-Long Chang received the Ph.D. degree in

computer science and information engineering from National Cheng Kung University, Taiwan, R.O.C., in 1999.

He is currently an Associated Professor at Na-tional Kaohsiung University of Applied Sciences, Kaosiung, Taiwan. His research interests include the design of DNA-based algorithms on molecular computing, and languages and compilers for parallel computing.