Fast Parallel Molecular Algorithms for DNA-based Computation: Factoring Integers

(1)

Fast Parallel Molecular Algorithms for DNA-Based

Computation: Factoring Integers

Weng-Long Chang, Minyi Guo, Member, IEEE, and Michael Shan-Hui Ho*

Abstract—The RSA public-key cryptosystem is an algorithm

that converts input data to an unrecognizable encryption and converts the unrecognizable data back into its original decryption form. The security of the RSA public-key cryptosystem is based on the difficulty of factoring the product of two large prime numbers. This paper demonstrates to factor the product of two large prime numbers, and is a breakthrough in basic biological operations using a molecular computer. In order to achieve this, we propose three DNA-based algorithms for parallel subtractor, parallel comparator, and parallel modular arithmetic that formally verify our designed molecular solutions for factoring the product of two large prime numbers. Furthermore, this work indicates that the cryptosystems using public-key are perhaps insecure and also presents clear evidence of the ability of molecular computing to perform complicated mathematical operations.

Index Terms—Biological parallel computing, DNA-based

algo-rithms, DNA-based computing, factoring integers, RSA public-key cryptosystem.

I. INTRODUCTION

T

HE RSA public-key cryptosystem [34] is an algorithm that converts input data to an unrecognizable encryption, and converts the unrecognizable data back into its original decryp-tion form. The construcdecryp-tion of the RSA public-key cryptosystem is based on the ease of finding large prime numbers. The security for the cryptosystem using public-key is based on the difficulty of factoring the product of two large prime numbers. The RSA public-key cryptosystem is the most popular cryptosystem. No method in a reasonable amount of time can be applied to break the RSA public-key cryptosystem.

Feynman proposed molecular computation in 1961, but his idea was not implemented by experiment for a few decades [37]. In 1994 Adleman [2] succeeded in solving an instance of the Hamiltonian path problem in a test tube, just by handling DNA strands. Lipton [3] demonstrated that the Adleman tech-niques could be used to solve the satisfiability problem (the first NP-complete problem). Adleman et al. [14] proposed sticker for enhancing the error rate of hybridization.

Manuscript received April 7, 2004; revised November 25, 2004. This work was supported in part by the Republic of China National Science Foundation under Grant NSC-92-2213-E-218-026 and Japan JSPS Grant-in-Aid for Sci-entific Research under Grant (C) 14580386. Asterisk indicates corresponding

author.

W.-L. Chang and M. S.-H. Ho are with the Department of Information Man-agement, Southern Taiwan University of Technology, Tainan, Taiwan, R.O.C. (E-mail: changwl@mail.stut.edu.tw; MHoInCerritos@yahoo.com).

*M. Guo is with the School of Computer Science and Engineering, University of Aizu, Aizu-Wakamatsu City 965-8580, Japan (E-mail: minyi@u-aizu.ac.jp).

Digital Object Identifier 10.1109/TNB.2005.850474

Through advances in molecular biology [1], it is now pos-sible to produce roughly 10 DNA strands that fit in a test tube. Those 10 DNA strands can also be applied to represent 10 bits of information. In the future (perhaps after many years) if bi-ological operations can be applied to deal with a tube with 10 DNA strands and they are run without errors, then 10 bits of information can simultaneously be correctly processed. Hence, in the future, it is possible that biological computing can provide a huge amount of parallelism for dealing with many computa-tionally intensive problems in the real world.

The fastest super computers currently available can execute approximately 10 integer operations per second. This implies that 128 10 bits of information can be simultaneously processed in a second. The fastest super computers can process 128 10 bits of information in 1000 seconds. The extract operation is one of basic biological operations of the longest execution time. An extract operation could be approximately done in 1000 s [12]. In the future, if an extract operation can be used to deal with a tube with 10 DNA strands and it is run without errors, then 10 bits of information can simultaneously be correctly processed in 1000 s. If it becomes true in the future, then basic biological operations will perhaps be faster than the fastest super computer in the future. In [12], it was pointed out that storing information in molecules of DNA allows for an in-formation density of approximately 1 bit/nm . Videotape is a kind of traditional storage media and its information density is approximately 1 bit/10 nm . This implies that an information density in molecules of DNA is better than that of traditional storage media.

In this paper, we first construct solution spaces of DNA strands for encoding every integer of bits. By using basic biological operations, we then develop DNA-based algorithms for a parallel subtractor, a parallel comparator, and a parallel divider, respectively, to factor the product of two large prime numbers of bits. We also show that cryptosystems based on the dramatic difference between the ease of finding large prime numbers of bits and the difficulty of factoring the product of two large prime numbers of bits can be broken. Furthermore, this work presents clear evidence of molecular computing ability to finish parallel mathematical operations.

The rest of this paper is organized as follows. Section II first introduces DNA models of computation proposed by Adleman

et al. and compares them with other models. Section III

intro-duces the DNA program to factor the product of two large prime numbers of bits for solution spaces of DNA strands. Discus-sion and concluDiscus-sion are drawn in Section IV and Section V, respectively.

(2)

Fig. 1. A schematic representation of a nucleotide.

II. BACKGROUND

In this section we review the basic structure of the DNA mole-cule and then discuss available techniques for dealing with DNA that will be used to solve the problem of factoring integers. Si-multaneously, several well-known DNA models are compared.

A. The Structure of DNA

From [1], [16], DNA (DeoxyriboNucleic Acid) is the

mole-cule that plays the main role in DNA-based computing. In the

biochemical world of large and small molecules, polymers, and

monomers, DNA is a polymer, which is strung together from

monomers called deoxyriboNucleotides. The monomers used for the construction of DNA are deoxyribonucleotides. Each de-oxyribonucleotide contains three components: a sugar, a

phos-phate group, and a nitrogenous base. The sugar has five carbon

atoms—for the sake of reference there is a fixed numbering of them. Because the base also has carbons, to avoid confusion the carbons of the sugar are numbered from 1 to 5 (rather than from one to five). The phosphate group is attached to the 5 carbon, and the base is attached to the 1 carbon. Within the sugar struc-ture there is a hydroxyl group attached to the 3 carbon.

Distinct nucleotides are detected only with their bases, which come in two sorts: purines and pyrimidines. Purines include

adenine and guanine, abbreviated and . Pyrimidines contain cytosine and thymine, abbreviated and . Because nucleotides are distinguished solely from their bases, they are simply represented as , , , or nucleotides, depending upon the kinds of base that they have. The structure of a nucleotide, cited from [16], is illustrated (in a very simplified way) in Fig. 1. In Fig. 1, B is one of the four possible bases ( , , , or ), P is the phosphate group, and the rest (the “stick”) is the sugar base (with its carbons enumerated 1 through 5 ).

Nucleotides can be linked together in two different ways [1], [16]. The first method is that the 5 -phosphate group of one nu-cleotide is joined with 3 -hydroxyl group of the other forming a phosphodiester bond. The resulting molecule has the 5 -phos-phate group of one nucleotide, denoted as 5 end, and the 3 -OH group of the other nucleotide available, denoted as 3 end, for bonding. This gives the molecule the directionality, and we can talk about the direction of 5 end to 3 end or 3 end to 5 end. The second way is that the base of one nucleotide interacts with the base of the other to form a hydrogen bond. This bonding is the subject of the following restriction on the base pairing: and can pair together, and and can pair together—no other pairings are possible. This pairing principle is called the Watson–Crick complementarity (named after J. D. Watson and F. H. C. Crick, who deduced the famous double helix structure of DNA in 1953 and won the Nobel Prize for the discovery).

A DNA strand is essentially a sequence (polymer) of four types of nucleotides detected by one of four bases they contain. Two strands of DNA can form (under appropriate conditions) a double strand, if the respective bases are the Watson–Crick

complements of each other— matches and matches ;

also 3 end matches 5 end. The length of a single-stranded DNA is the number of nucleotides composing the single strand. Thus, if a single stranded DNA includes 20 nucleotides, then we say that it is a 20 mer (i.e., it is a polymer containing 20 monomers). The length of a double-stranded DNA (where each nucleotide is base paired) is counted in the number of base pairs. Thus, if we make a double-stranded DNA from a single stranded 20 mer, then the length of the double stranded DNA is 20 base pairs, also written 20 bp. Hybridization is a special technology term for the pairing of two single DNA strands to make a double helix and also takes advantages of the specificity of DNA base pairing for the detection of specific DNA strands. (For more discussions of the relevant biological background, refer to [1] and [16]).

B. Adleman’s Experiment for Solving the Hamiltonian Path Problem

Assume a directed graph , where and are the

set of vertices and the set of edges respectively. In general, the Hamiltonian path problem consists of deciding whether has a Hamiltonian path or not. with designed vertices and is said to have a Hamiltonian path if and only if there exists a sequence of compatible “one way” edges (that is, a “path”), which begins at , ends at , and enters every other vertex exactly once [2].

Adleman’s experiment is used to solve the

Hamil-tonian path problem for a directed ,

where and

[2]. The first step of Adleman’s experiment is to generate random paths through the directed graph . To generate random paths, each vertex

in for was associated with a random 20-mer

sequence of DNA denoted . For each edge in , an

oligonucleotide was created which was the 3 10 mer of (unless , in which case it was all of ) followed by the 5 mer of (unless , in which case it was all of ). The 20-mer sequence Watson–Crick complementary to was denoted . For each vertex in (except

and ) and for each edge in , large quantities

of oligonucleotides and were mixed together in a

single ligation reaction. Here the oligonucleotides served as splints to bring oligonucleotides associated with compatible edges together for ligation. Consequently, the ligation reaction resulted in the formation of DNA molecules that can be viewed as encoding random paths through the directed graph . From the random paths generated, basic biological operations are applied to remove illegal paths and select a Hamiltonian path [2].

C. The Sticker-Based Model

The sticker-based model employs two basic groups of single-stranded DNA molecules in its representation of a bit string [14]. Consider a memory strand bases in length subdivided into nonoverlapping regions each bases long (thus, ). Each region is identified with exactly one bit position (or equivalently one Boolean variable) during the course of the computation. Adleman et al. [14] also designed different sticker strands or simply stickers. Each sticker is bases long and is complementary to one and only one of

(3)

Fig. 2. An example of a sticker memory.

Fig. 3. Examples of memory complexes.

the memory regions. If a sticker is annealed to its matching region on a given memory strand, then the bit corresponding that particular region is on for that strand. If no sticker is annealed to a region, then that region’s bit is off. Each memory strand along with its annealed stickers (if any) represents one bit string. Such partial duplexes are called memory complexes. A large set of bit strings is represented by a large number of identical memory strands each of which has stickers annealed only at the required bit positions. Such a collection of memory complexes is called as a tube.

In this model, a unique association of memory strands and stickers represents each possible bit string. In the illustration given in Fig. 2, we consider a memory strand of length , divided into regions, each of length . Thus, in this case the necessary complexes are interpreted as containing four bits of information. In particular, consider the memory com-plexes depicted in Fig. 3. In the first memory complex, all re-gions are off, whereas in the last complex the last two rere-gions are on. The binary numbers represented by these four memory complexes are 0000, 0100, 1001, and 0011, respectively.

D. Adleman’s Experiment for Solution of a Satisfability Problem

Adleman et al. [22], [46] performed experiments that were applied to, respectively, solve a six-variable 11-clause for-mula and a 20-variable 24-clause three-conjunctive normal form (3-CNF) formula. A Lipton encoding [3] was used to represent all possible variable assignments for the chosen six-variable or 20-variable SAT problem. For each of the six variables , two distinct 15 base value sequences were designed. One represents true , , and another

represents false , for . Each of the truth

assignments was represented by a library sequence of 90 bases consisting of the concatenation of one value sequence for each variable. DNA molecules with library sequences are termed

library strands and a combinatorial pool containing library

strands is termed a library. The six-variable library strands were synthesized by employing a mix-and-split combinatorial synthesis technique [24]. The library strands were assigned library sequences with at the 5 -end and at the 3 -end

. Thus synthesis began by assembling the two 15 base oligonucleotides with sequences and . This process was repeated until all 6 variables had been treated.

The probes used for separating the library strands have se-quences complementary to the value sese-quences. Errors in the separation of the library strands are errors in the computation. Sequences must be designed to ensure that library strands have little secondary structure that might inhibit intended probe-li-brary hybridization. The design must also exclude sequences that might encourage unintended probe-library hybridization. To help achieve these goals, sequences were computer-gener-ated to satisfy the proposed seven constraints [22]. The similar method also is applied to solve a 20-variable of 3-SAT [46].

E. DNA Manipulations

In the past decade, there have been revolutionary advances in the field of biomedical engineering, particularly in recombinant DNA and RNA manipulating. Due to the industrialization of the biotechnology field, laboratory techniques for recombinant DNA and RNA manipulation are becoming highly standard-ized. Basic principles about recombinant DNA can be found in [47]–[50]. In this subsection we describe eight biological oper-ations that are useful for solving the problem of factoring inte-gers. The method of constructing DNA solution space for the problem of factoring integers is based on the proposed method in [22], [46].

A (test) tube is a set of molecules of DNA (a multiset of finite strings over the alphabet ). Given a tube, one can perform the following operations.

1. Extract. Given a tube and a short single strand of DNA, , the operation produces two tubes and , where is all of the molecules of DNA in which contain as a substrand and is all of the molecules of DNA in which do not contain .

2. Merge. Given tubes and , yield , where

. This operation is to pour two tubes into one, without any change in the individual strands. 3. Detect. Given a tube , if includes at least one DNA

molecule, we have “yes,” and if contains no DNA mol-ecule, we have “no.”

4. Discard. Given a tube , the operation will discard .

5. Amplify. Given a tube , the operation

Amplify , will produce two new tubes

and so that and are totally a copy of ( and are now identical) and becomes an empty tube. 6. Append. Given a tube containing a short strand of DNA , the operation will append onto the end of every strand in .

7. Append-head. Given a tube containing a short strand of DNA, , the operation will append onto the head of every strand in .

8. Read. Given a tube , the operation is used to describe a single molecule, which is contained in tube . Even if contains many different molecules each encoding a different set of bases, the operation can give an explicit description of exactly one of them.

(4)

F. Comparisons of Various Famous DNA Models

Based on solution space of splint in the Adleman–Lipton model, their methods [7], [17]–[20], [35] could be applied toward solving the traveling salesman problem, the dominating-set problem, the vertex cover problem, the clique problem, the inde-pendent-set problem, the three-dimensional matching problem, the set-packing problem, the set-cover problem, and the problem of exact cover by three-sets. Lipton et al. [51] indicated that DNA-based computing had been shown to easily be capable of breaking the data encryption standard from solution space of

splint. The methods used for solving problems have exponentially

increased volumes of DNA and linearly increased the time.

Bach et al. [33] proposed a volume, time

molecular algorithm for the three-coloring problem and a volume, time molecular algorithm for the independent set problem, where and are, subsequently, the number of ver- ticesandthenumberofedgesintheproblemsresolved.Fu[21]pre-sented a polynomial-time algorithm with a volume for the three-SAT problem, a polynomial-time algorithm with a volume for the three-coloring problem, and a polynomitime al-gorithm with a volume for the independent set. Though the size of those volumes [21], [33] is lower, constructing those vol-umes is more difficult and the time complexity is higher.

Quyang et al. [4] showed that enzymes could be used to solve the NP-complete clique problem. Because the maximum number of vertices that they can process is limited to 27, the maximum number of DNA strands for solving this problem is 2 [4]. Shin

et al. [8] presented an encoding scheme for decreasing the error

rate of hybridization. This method [8] can be employed toward ascertaining the traveling salesman problem for representing integers and real values with fixed-length codes. Arita et al. [5] and Morimoto et al. [6] proposed a new molecular experimental technique and a solid-phase method to find a Hamiltonian path. Amos [13] proposed a parallel filtering model for resolving the Hamiltonian path problem, the subgraph isomorphism problem, the three-vertex-colorability problem, the clique problem, and the independent-set problem. The methods in [5], [6], and [13] have lowered the error rate in real molecular experiments. In [26], [27], and [30], the methods for DNA-based computing by self-assembly require the use of DNA nanostructures, called tiles, to own expressive computational power and convenient input and output (I/O) mechanisms. That is, DNA tiles have lower error rate in self-assembly.

One of the earliest attempts to perform arithmetic operations (addition of two positive binary numbers) using DNA is by Guarneiri et al. [38], utilizing the idea of encoding differently bit values zero and one as single-stranded DNAs, based upon their positions and the operands in which they appear. Gupta

et al. [39] performed logic and arithmetic operations using the

fixed bit encoding of the full corresponding truth tables. Qiu and Lu [40] applied substitution operation to insert results (by en-coding all possible outputs of bit by bit operation along with second operand) in the operand strands. Ogihara and Ray [41], as well as Amos and Dunne [42] proposed methods to realize any Boolean circuit (with bounded fan in) using DNA strands in a constructive fashion. Other new suggestions to perform all basic arithmetic operations are by Atanasiu [43] using P sys-tems and by Frisco [44] using splicing operation under

gen-eral H systems, and by Hubert and Schuler [45]. Barua et al. [31] proposed a recursive DNA algorithm for adding two binary

numbers, which require biosteps using only

dif-ferent type of DNA strands, where is the size of the binary string representing the larger of the two numbers.

Adleman et al. [14] proposed a sticker-based model to enhance the error rate of hybridization in the Adleman–Lipton model. Their model can be used for determining solutions of an instance in the set cover problem. Simultaneously, Adleman

et al. [52] also pointed out that the data encryption standard

could be easily broken from solution space of stickers in the sticker-based model. Perez-Jimenez et al. [15] employed the sticker-based model [14] to resolve knapsack problems. In our previous work, Chang et al. [25], [32], [36], [53] also employed the sticker-based model and the Adleman–Lipton model for dealing with Cook’s theorem [9], [10], the set-split-ting problem, the subset-sum problem, and the dominaset-split-ting-set problem for decreasing the error rate of hybridization.

III. FACTORING THEPRODUCT OFTWOLARGEPRIME NUMBERS OF BITS

A. RSA Public-Key Cryptosystem

In the RSA cryptosystem [34], a participant creates his public and secret keys with the following steps. The first step is to select at random two large prime numbers and , assuming that the length of and are both bits. The second step is to compute by the equation . The third step is to select a small odd integer that is relatively prime to , which is equal

to . The fourth step is to compute as the

multiplicative inverse of , module . The fifth step is to publish the pair as his RSA public key. The sixth step is to keep secret the pair as his secret key. A method to factor as in a reasonable amount of time has not been found.

B. Solution Space of DNA Strands for Every Unsigned Integer of Bits

Suppose that an unsigned integer of bits is represented as a -bit binary number, , where the value of each

bit is either one or zero for . The bits and

represent, respectively, the most significant bit and the least significant bit for . The range of the value to an unsigned integer of bits is from 0 to . From [22], [46], for every bit , two distinct 15 base value sequences are designed. One represents the value zero for and the other represents the value one for . For convenience, we assume that denotes the value of to be one and defines the value of to be zero. The following algorithm is used to construct the solution space of DNA strands for different unsigned integer values.

Procedure InitialSolution(T ) (1) For j = k down to 1 (1a) Amplify(T , T , T ). (1b) Append(T , m ). (1c) Append(T , m ). (1d) T = [(T ; T ). EndFor EndProcedure

(5)

TABLE I

RESULT FORTUBET ISGENERATED BY THEALGORITHM

INITIALSOLUTION(T )

Consider that the number of bits for is 3 bits. Eight values for are, respectively, 000, 001, 010, 011,100, 101 110, and 111. Tube is an empty tube and is regarded as an input tube for the algorithm InitialSolution . Because the value for is three, Steps (1a) through (1d) will be run three times. After the first execution of Step (1a) is finished, tube , tube , and tube . Next, after the first execution for Step (1b) and Step (1c) is performed, tube and tube . After the first execution of Step (1d) is run, tube

, tube , and tube . Then, after

the second execution of Step (1a) is finished, tube , tube

, and tube . After the rest of

operations are performed, tube , tube , and the

result for tube is shown in Table I. Lemma 1 is applied to demonstrate correction of the algorithm InitialSolution .

Lemma 1: The algorithm InitialSolution is used to con-struct the solution space of DNA strands for different un-signed integer values.

Proof: The algorithm InitialSolution is implemented by means of the amplify, append, and merge operations. Each execution of Step (1a) is used to amplify tube and to generate two new tubes, and , which are copies of . Tube then becomes empty. Then, Step (1b) is applied to append a DNA sequence, representing the value one for , onto the end of every strand in tube . This is to say that those integers containing the value one to the th bit appear in tube . Step (1c) is also employed to append a DNA sequence, representing the value zero for , onto the end of every strand in tube . That implies that these integers containing the value zero to the th bit appear in tube . Next, Step (1d) is used to pour tubes and into tube . This indicates that DNA strands in tube

include DNA sequences of and . At the end

of Step (1), tube consists of DNA sequences representing different unsigned integer values.

From InitialSolution , it takes amplify operations, append operations, merge operations, and three test tubes to

construct the solution space of DNA strands. A value sequence for every bit contains 15 bases. Therefore, the length of a DNA strand, encoding an unsigned integer value of bits, is bases consisting of the concatenation of one value sequence for each bit.

C. The Construction to the Product of Two Large Prime Numbers of Bits

Assume that the length for , the product of two large prime numbers of bits, denoted in Section III-A, is bits. Also suppose that the product is used to represent the minuend (dividend) and the difference for successive compare, shift, and subtract operations in a divider. When is divided by , an

TABLE II

RESULT FORTUBET ISGENERATED BY THEALGORITHM

INITIALPRODUCT(T )

unsigned integer of bits denoted in Section III-B, is one of two large prime numbers if the remainder is equal to zero. Assume that in a divider the length of a dividend is bits and the length of a divisor is bits, where . It is very obvious that the division instruction is finished through successive compare, shift, and subtract operations of at most times. Therefore, suppose that is represented as a

-bit binary number, , where the value of

each bit is either one or zero for and

. The bits and , respectively,

rep-resent the most significant bit and the least significant bit for .

One binary number and another binary number

are, respectively, applied to represent the minuend and the difference for the successive compare, shift, and subtract operations of the th time. This is to say that the

binary number is the minuend for the

suc-cessive compare, shift, and subtract operations of the th time.

For every bit , two distinct 15 base value sequences were designed. One represents the value zero for and the other represents the value one for . For convenience, we assume

that denotes the value of to be one and defines

the value of to be zero. The following algorithm is used to construct a DNA strand for the value of .

Procedure InitialProduct(T ) (1) For q = 1 to 2 3 k

(1a) Append-head(T ; n ; q). EndFor

EndProcedure

Consider that the number of bits for is 6 bits and the value for is 001 111. Tube with the result shown in Table I is regarded as an input tube for the algorithm, InitialProduct . Because the value for is six, Step (1a) will be executed six times. After each operation for Step (1a) is performed, the result is shown in Table II. Lemma 2 is used to prove correction of the algorithm InitialProduct .

Lemma 2: A DNA strand for the value of can be con-structed from InitialProduct .

Proof: Refer to Lemma 1.

From InitialProduct , it takes append-head

oper-ations and one test tube to construct a DNA strand. The length of the DNA strand, encoding the value of , is bases con-sisting of the concatenation of one value sequence for each bit.

(6)

IV. DISCUSSION

The proposed algorithm (Algorithm 1) for factoring the product of two large prime numbers of bits is based on biological operations from solution space of DNA strands. This algorithm has several advantages from biological operations and solution space of DNA strands. First, the Adleman program [22], [46] was used to generate good DNA sequences to con-struct the solution space of DNA strands. Good DNA sequences were applied to decrease a rate of errors for hybridization. This indicates that the proposed algorithm actually has a lower rate of errors for hybridization.

Second, basic biological operations were employed to finish the function of a -bit parallel comparator, the function of a parallel subtractor, and the function of a parallel divider. This means that the proposed algorithm has the computational ca-pability of mathematics to finish subtraction (“ ”) and division (“ ”). Basic biological operations had been performed in a fully automated manner in their lab. The full automation manner is essential not only for the speedup of computation but also for error-free computation.

Third, in Algorithm 1 for factoring the product of two large prime numbers of bits, the number of tubes, the longest length of DNA strands, the number of DNA strands, and the number of biological operations, respectively, are (1), , , and . This implies that the proposed algorithm can be easily performed in a fully automated manner in a lab. Fourthly, after is factored as from Algorithm 1, decoding an encrypted message overheard is performed on a classical computer. This is to say that decoding an overheard encrypted message can be easily implemented on a classical computer after is factored

as .

V. CONCLUSION

A general digital computer mainly contains the CPU and memory. The main function for the CPU is to perform mathe-matical computational tasks and the main function to memory is to store each data needed for mathematical computational tasks. However, on a general molecular computer, each data needed for mathematical computational tasks is encoded by means of a DNA strand and performing mathematical computational tasks is by means of a DNA algorithm (including a series of basic bi-ological operations) on those DNA strands. The execution time for any basic biological operation is very longer than that of a

digital mathematical instruction. Hence, in order to significantly

improve the execution time for any basic biological operation, Adleman [2] indicated that exponential DNA strands are neces-sary. This implies that by means of a basic biological operation on exponential DNA strands can be used to perform exponential

digital mathematical instructions.

The paper is the first paper that demonstrates that the difficult problem for factoring the product of two large prime numbers of bits can be solved on a DNA-based computer. The proposed algorithm takes a number of steps that is polynomial in the input size, e.g., the number of binary digits of the product (integer) to be factored. Simultaneously, the paper also shows that humans’ mathematical operations can directly be performed with basic biological operations. The property for the difficulty of factoring

the product of two large prime numbers is the basis of cryp-tosystems using public key. However, the property seems to be incorrect on a molecular computer. This indicates that the cryp-tosystems using public key are perhaps insecure. Furthermore, the first example of molecular cryptanalysis for cryptosystems based on public key is proposed in the paper.

Currently the future of molecular computers is unclear. It is possible that in the future molecular computers will be the clear choice for performing massively parallel computations. How-ever, there are still many technical difficulties to overcome be-fore this becomes a reality. We hope that this paper helps to demonstrate that molecular computing is a technology worth pursuing.

ACKNOWLEDGMENT

The authors would to thank the anonymous reviewers for ex-tensive comments and suggestions, which improved the presen-tation of this paper considerably.

REFERENCES

[1] R. R. Sinden, DNA Structure and Function. New York: Academic, 1994.

[2] L. Adleman, “Molecular computation of solutions to combinatorial problems,” Science, vol. 266, pp. 1021–1024, Nov. 11, 1994. [3] R. J. Lipton, “DNA solution of hard computational problems,” Science,

vol. 268, pp. 542–545, 1995.

[4] Q. Quyang, P. D. Kaplan, S. Liu, and A. Libchaber, “DNA solution of the maximal clique problem,” Science, vol. 278, pp. 446–449, 1997. [5] M. Arita, A. Suyama, and M. Hagiya, “A heuristic approach for the

Hamiltonian path problem with molecules,” in Proc. 2nd Genetic

Pro-gramming Conf. (GP ’97), pp. 457–462.

[6] N. Morimoto, M. Arita, and A. Suyama, “Solid phase DNA solution to the Hamiltonian path problem,” in Proc. 3rd DIMACS Workshop DNA

Based Computers, vol. 48, 1999, pp. 93–101.

[7] A. Narayanan and S. Zorbala et al., “DNA algorithms for computing shortest paths,” in Proc. 3rd Annu. Conf. Genetic Programming 1998, J. R. Koza et al., Eds., pp. 718–724.

[8] S.-Y. Shin, B.-T. Zhang, and S.-S. Jun, “Solving traveling salesman problems using molecular programming,” in Proc. 1999 Congr.

Evolu-tionary Computation (CEC ’99), vol. 2, pp. 994–1000.

[9] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to

Algo-rithms. Cambridge, MA: MIT Press, 1990.

[10] M. R. Garey and D. S. Johnson, Computer and Intractability. San Fransico, CA: Freeman, 1979.

[11] D. Boneh, C. Dunworth, R. J. Lipton, and J. Sgall, “On the computa-tional power of DNA,” In Discrete Appl. Math. (Special Issue on

Com-putational Molecular Biology), vol. 71, pp. 79–94, 1996.

[12] L. M. Adleman, “On constructing a molecular computer, DNA based computers,” in DNA Based Computers (Selected Papers from Proc.

DI-MACS Workshop DNA Based Computers’95), DIDI-MACS Series in

Dis-crete Mathematics and Theoretical Computer Science, R. Lipton and E. Baum, Eds, 1996, pp. 1–21.

[13] M. Amos, “DNA computation,” Ph.D. dissertation, Dept. Comput. Sci., Univ. Warwick, Coventry, U.K., 1997.

[14] S. Roweis, E. Winfree, R. Burgoyne, N. V. Chelyapov, M. F. Goodman, P. W. K. Rothemund, and L. M. Adleman, “A sticker based model for DNA computation,” in Proc. 2nd Annu. Workshop DNA Computing, DI-MACS Series in Discrete Mathematics and Theoretical Computer Sci-ence, L. Landweber and E. Baum, Eds., 1999, pp. 1–29.

[15] M. J. Perez-Jimenez and F. Sancho-Caparrini, “Solving knapsack prob-lems in a sticker based model,” in Proc. 7nd Annu. Workshop DNA

puting, DIMACS Series in Discrete Mathematics and Theoretical

Com-puter Science, 2001, pp. 161–171.

[16] G. Paun, G. Rozenberg, and A. Salomaa, DNA Computing: New

Com-puting Paradigms. New York: Springer-Verlag, 1998.

[17] W.-L. Chang and M. Guo, “Solving the dominating-set problem in the Adleman–Lipton model,” in Proc. 3rd Int. Conf. Parallel and Distributed

(7)

[18] , “Solving the clique problem and the vertex cover problem in the Adleman–Lipton model,” in Proc. IASTED Int. Conf. Networks, Parallel

and Distributed Processing, and Applications, 2002, pp. 431–436.

[19] , “Solving NP-complete problem in the Adleman–Lipton model,” in Proc. 2002 Int. Conf. Computer and Information Technology, pp. 157–162.

[20] , “Resolving the 3-dimensional matching problem and the set-packing problem in the Adleman–Lipton model,” in Proc. IASTED

Int. Conf. Networks, Parallel and Distributed Processing, and Applica-tions, 2002, pp. 455–460.

[21] B. Fu, “Volume bounded molecular computation,” Ph.D. Thesis, Dept. Comput. Sci., Yale Univ., New Haven, CT, 1997.

[22] R. S. Braich, C. Johnson, P. W. K. Rothemund, D. Hwang, N. Chelyapov, and L. M. Adleman, “Solution of a satisfiability problem on a gel-based DNA computer,” in Proc. 6th Int. Conf. DNA Computation, 2000, pp. 27–42.

[23] K. Mir, “A restricted genetic alphabet for DNA computing,” in DNA

Based Computers II: Proc. DIMACS Workshop 1996, vol. 44, DIMACS

Series in Discrete Mathematics and Theoretical Computer Science, E. B. Baum and L. F. Landweber, Eds., 1998, pp. 243–246.

[24] A. R. Cukras, D. Faulhammer, R. J. Lipton, and L. F. Landweber, “Chess games: a model for RNA-based computation,” in In Proc. 4th DIMACS

Meeting DNA Based Computers, Jun. 1998, pp. 27–37.

[25] M. Ho, W.-L. Chang, and M. Guo, “Is cook’s theorem correct for DNA-based computing—toward solving the NP-complete problems on a DNA-based supercomputer model,” J. Parallel Distrib. Sci. Eng.

Comput., to be published.

[26] J. H. Reif, T. LaBean, and H. Seeman, “Challenges and applications for self-assembled DNA-nanostructures,” in Proc. 6th DIMACS Workshop

DNA Based Computers, 2002, pp. 145–172.

[27] T. H. LaBean, E. Winfree, and J. H. Reif, “Experimental progress in computation by self-assembly of DNA tilings,” Theor. Comput. Sci., vol. 54, pp. 123–140, 2000.

[28] S. A. Cook, “The complexity of theorem-proving procedures,” in Proc.

3rd Annu. ACM Symp. Theory of Computing, 1971, pp. 151–158.

[29] R. M. Karp, “Reducibility among combinatorial problems,” in

Com-plexity of Computer Computation. New York: Plenum, 1972, pp. 85–103.

[30] M. C. LaBean, T. H. Reif, and J. H. Seeman, “Logical computation using algorithmic self-assembly of DNA triple-crossover molecules,” Nature, vol. 407, pp. 493–496, 2000.

[31] R. Barua and J. Misra, “Binary arithmetic for DNA computers,” in Proc.

8th Int. Workshop DNA Based Computers, 2002, pp. 124–132.

[32] W.-L. Chang, M. Guo, and M. Ho, “Solving the set-splitting problem in sticker-based model and the Adleman–Lipton model,” Future Gener.

Comput. Syst., vol. 20, no. 5, pp. 875–885, Jun. 2004.

[33] E. Bach, A. Condon, E. Glaser, and C. Tanguay, “DNA models and algo-rithms for NP-complete problems,” in Proc. 11th Annu. Conf. Structure

in Complexity Theory, 1996, pp. 290–299.

[34] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining dig-ital signatures and public-key crytosystem,” Commun. ACM, vol. 21, pp. 120–126, 1978.

[35] W.-L. Chang and M. Guo, “Solving the set-cover problem and the problem of exact cover by 3-Sets in the Adleman–Lipton model,”

BioSystems, vol. 72, no. 3, pp. 263–275, 2003.

[36] W.-L. Chang, M. Ho, and M. Guo, “Molecular solutions for the subset-sum problem on DNA-based supercomputing,” BioSystems, vol. 73, no. 2, pp. 117–130, 2004.

[37] R. P. Feynman, In Minaturization, D. H. Gilbert, Ed. New York: Rein-hold, 1961, pp. 282–296.

[38] F. Guarneiri, M. Fliss, and C. Bancroft, “Making DNA add,” Science, vol. 273, pp. 220–223, 1996.

[39] V. Gupta, S. Parthasarathy, and M. J. Zaki, “Arithmetic and logic oper-ations with DNA,” in Proc. 3rd DIMACS Workshop DNA Based

Com-puters, 1997, pp. 212–220.

[40] Z. F. Qiu and M. Lu, “Arithmetic and logic Operations with DNA com-puters,” in Proc. 2nd IASTED Int. Conf. Parallel and Distributed

Com-puting and Networks, 1998, pp. 481–486.

[41] M. Ogihara and A. Ray, “Simulating Boolean circuits on a DNA com-puter,” Univ. Rochester, Rochester, NY, Tech. Rep. TR631, Aug. 1996. [42] M. Amos and P. E. Dunne, “DNA simulation of Boolean circuits,” Uni-versity of Liverpool, Liverpool, U.K., Tech. Rep. CTAG-97 009, Dec. 1997.

[43] A. Atanasiu, “Arithmetic with membrames,” in Proc. Workshop Mutiset

Processing, 2000, pp. 1–17.

[44] P. Frisco, “Parallel arithmetic with splicing,” Romanian J. Inf. Sci.

Technol., vol. 3, pp. 113–128, 2000.

[45] H. Hug and R. Schuler, “DNA based parallel computation of simple arithmetic,” in Proc. 7th Workshop DNA Based Computers, 2001, pp. 159–166.

[46] R. S. Braich, C. Johnson, P. W. K. Rothemund, D. Hwang, N. Chelyapov, and L. M. Adleman, “Solution of a 20-variable 3-SAT problem on a DNA computer,” Science, vol. 296, no. 5567, pp. 499–502, Apr. 2002. [47] J. Watson, M. Gilman, J. Witkowski, and M. Zoller, Recombinant DNA,

2nd ed. San Francisco, CA: Freeman, 1992.

[48] J. Watson, N. Hoplins, and J. Roberts et al., Molecular Biology of the

Gene. Menlo Park, CA: Benjamin/Cummings, 1987.

[49] G. M. Blackburn and M. J. Gait, Nucleic Acids in Chemistry and

Bi-ology. Washington, DC: IRL, 1990.

[50] F. Eckstein, Oligonucleotides and Anologues. Oxford, U.K.: Oxford Univ. Press, 1991.

[51] D. Boneh, C. Dunworth, and R. J. Lipton, “Breaking DES Using a Molecular Computer,” Princeton Univ., Princeton, NJ, Tech. Rep. CS-TR-489-95, 1995.

[52] L. Adleman, P. W. K. Rothemund, S. Roweis, and E. Winfree, “On applying molecular computation to the data encryption standard,” in

Proc. 2nd Annu. Workshop DNA Computing, DIMACS Series in

Dis-crete Mathematics and Theoretical Computer Science, 1999, pp. 31–44. [53] W.-L. Chang, M. Ho, and M. Guo, “Fast parallel molecular solution to the dominating-set problem on massively parallel bio-computing,”

Par-allel Comput., vol. 30, no. 9–10, pp. 1109–1125, 2004.

Weng-Long Chang received the Ph.D. degree in computer science and

infor-mation engineering from National Cheng Kung University in 1999.

He is currently an Assistant Professor with the Southern Taiwan University of Technology, Tainan. His research interests include molecular computing, and languages and compilers for parallel computing.

Minyi Guo (M’00) received the Ph.D. degree in information science from

Uni-versity of Tsukuba in 1998.

From 1998 to 2000, he was a Research Scientist with NEC Soft, Ltd. Japan. From 2001 to 2004, he was a visiting professor of Georgia State University, Hong Kong Polytechnic University, and the University of New South Wales. He is currently a Professor in the Department of Computer Software, Univer-sity of Aizu, Aizu–Wakamatsu City, Japan. He is the Editor-in-Chief of the

Journal of Embedded Systems. He is also on the Editorial Board of the Inter-national Journal of High Performance Computing and Networking, the Journal of Embedded Computing, the Journal of Parallel and Distributed Scientific and Engineering Computing, and the International Journal of Computer and Ap-plications. His research interests include parallel and distributed processing,

parallelizing compilers, data parallel languages, data mining, molecular com-puting, and software engineering. Dr. Guo is a member of the Association for Computing Machinery, the IEEE Computer Society, the Information Processing Society of Japan (IPSJ), and the Institute of Electronics, Information and Com-munication Engineers (IEICE). He has served as general chair, program com-mittee, or organizing committee chair for many international conferences. He is listed in Marquis Who’s Who in Science and Engineering.

Michael Shan-Hui Ho received the M.S. degree from St. Mary’s University and

the Ph.D. degree in information science/computer science with a management and accounting minor from the University of Texas, Austin.

He has 25 years industrial and academic experience in the computing field. He has worked as a Senior Software Engineer and Project Leader developing e-busi-ness Web and multimedia applications and a Senior Database Administrator for SQL clustered servers, Oracle, and DB2 databases including network/systems LAN/WAN system administration tomajor U.S. corporations and government organizations. He also has more than ten years of college teaching/research ex-perience as an Assistant Professor and Research Analyst at Central Missouri State University, the University of Texas at Austin, and BPC International In-stitute. He is currently an Associate Professor of Southern Taiwan University of Technology, Tainan. His research interests include algorithm and computa-tion theories, software engineer, database and data mining, parallel computing, quantum computing, and DNA computing.

Fast Parallel Molecular Algorithms for DNA-based Computation: Factoring Integers