Fast parallel DNA-based algorithms for molecular computation: discrete logarithm

(1)

DOI 10.1007/s11227-009-0347-9

Fast parallel DNA-based algorithms for molecular

computation: discrete logarithm

Weng-Long Chang· Shu-Chien Huang · Kawuu Weicheng Lin· Michael (Shan-Hui) Ho

Abstract Diffie and Hellman (IEEE Trans. Inf. Theory 22(6):644–654,1976) wrote the paper in which the concept of a trapdoor one-way function was first proposed. The Diffie–Hellman public-key cryptosystem is an algorithm that converts input data to an unrecognizable encryption, and converts the unrecognizable data back into its origi-nal decryption form. The security of the Diffie–Hellman public-key cryptosystem is based on the difficulty of solving the problem of discrete logarithms. In this paper, we demonstrate that basic biological operations can be applied to solve the problem of discrete logarithms. In order to achieve this, we propose DNA-based algorithms that formally verify our designed molecular solutions for solving the problem of discrete logarithms. Furthermore, this work indicates that public-key cryptosystems based on the difficulty of solving the problem of discrete logarithms are perhaps insecure. Keywords Discrete logarithm· The public-key cryptosystems · Cryptography · Security technologies· Molecular cryptography · Biological-based supercomputing · Molecular-based supercomputing· DNA-based supercomputing

W.-L. Chang (

)· K.W. Lin

Department of Computer Science and Information Engineering, National Kaohsiung University of Applied Sciences, No. 415, Chien Kung Road, Kaohsiung City 807-78, Taiwan, Republic of China e-mail:[email protected]

K.W. Lin

e-mail:[email protected]

S.-C. Huang

Department of Computer Science, National PingTung University of Education, No. 4-18 Ming Shen Road, Pingtung 900, Taiwan, Republic of China

e-mail:[email protected]

M. (S.-H.) Ho

Computer Center and Institute of Electrical Engineering, National Taipei University, 151, University Rd., San Shia 237, Taipei County, Taiwan, Republic of China

(2)

1 Introduction

Feynman first proposed molecular computation in 1961, but his idea was not imple-mented by experiment for a few decades [1]. In 1994, Adleman [2] succeeded to solve an instance of the Hamiltonian path problem in a test tube, just by handling DNA strands. Diffie and Hellman [3] wrote the paper in which the concept of a trap-door one-way function is proposed. The Diffie–Hellman public-key cryptosystem [3] is a popular cryptosystem and is one of the primary cryptosystems used for security on the Internet and World Wide Web.

DES (the United States Data Encryption Standard) is one of the most widely used cryptographic systems. It produces a 64-bit ciphertext from a 64-bit plaintext under the control of a 56-bit key. A cryptanalyst obtains a plaintext and its corresponding ciphertext and wishes to determine the key used to perform the encryption. The most naive approach to this problem is to try all 256 _{keys, encrypting the plaintext under} each key until a key that produces the ciphertext is found and is called the plaintext-ciphertext attack. Adleman and his co-authors [4] provided a description of such an attack using the sticker model of molecular computation. Start with approximately 256 identical ssDNA memory strands each 11,580 nucleotides long. Each memory strand contains 579 contiguous blocks each 20 nucleotides long. As it is appropri-ate in the sticker model, there are 579 stickers—one complementary to each block. Memory strands with annealed stickers are called memory complexes. When the 256 memory complexes have half of their sticker positions occupied at the end of the computation, they weigh approximately 0.7 g and, in solution at 5 g/liter, would oc-cupy approximately 140 ml. Hence, the volume of the 1303 tubes needs be no more than 140 ml each. It follows that the 1303 tubes occupy, at most, 182 liters and can, for example, be arrayed in 1 m long and wide and 18 cm deep.

Adleman and his co-authors [4] indicated that at the end of computation for break-ing DES, 256× (56 key bits + 64 ciphertext bits) pairs were generated and processed. Adleman and his co-authors [4] also pointed out that this codebook for breaking DES has approximately 263(8×1018)bits of information (the equivalent of approximately one billion 1 gigabyte CDs). The actual running time for the algorithm of breaking DES depends on how fast the operations can be performed. If each operation re-quires one day, then the computation for breaking DES will require 18 years. If each operation requires one hour, then the computation for breaking DES will require ap-proximately nine months. If each operation can be completed in one minute, then the computation for breaking DES will take five days. Finally, if the effective duration of a step can be reduced to one second, then the effort for breaking DES will require two hours. While it has been argued that special purpose electronic hardware [4] or massively parallel supercomputers (the IBM Blue Gene/L machine is capable of 183.5 TFLOPS or 183.5× 1012floating-point operations per second) might be used to break DES in a reasonable amount of time, it appears that today’s most powerful sequential machines would be unable to accomplish the task.

In this paper, we describe novel DNA-based algorithms for a range of binary op-erations, consisting of bitwise and full comparison, left shifter, addition, subtraction, modular arithmetic, and assignment. We also prove how these smaller modules may be combined to produce an algorithm for solving the problem of discrete logarithm.

(3)

The rest of the paper is organized as follows. In Sect.2, we introduce the devel-opment of molecular computing. In Sect.3, we provide the motivation of writing the article, and the formal model of computation within which the various algorithms are expressed. In Sect.4, we give a high-level description of our algorithm to solve the problem of discrete logarithms. By means of breaking this down into sub-modules in Sect.5, we show the operation of the various novel algorithms for comparative and arithmetic operations. In Sect.6, we propose the attacking method to break the Diffie–Hellman public-key cryptosystem. In Sect.7, we demonstrate that the time complexity of our algorithm is cubic on the input size. In Sect.8, we show how the basic operations within our model may be implemented by means of using standard laboratory operations on DNA strands. In Sect.9, we conclude with a brief discus-sion.

2 The development of molecular computing

From [5], it was demonstrated that optimal biological molecular solution of every NP-complete or NP-hard problem is determined due to its characteristic. Molecular dynamics and (sequential) membrane systems from the viewpoint of Markov chain theory were proposed in [6]. Reif and LaBean [7] overviewed the past and current state of a selected part of the emerging research area of the field of bio-molecular de-vices. There are DNA algorithms for solving many famous computational problems, including the 3-SAT problem [14], the binary integer programming problem [15], the dominating-set problem [16], three-vertex-coloring [17], the maximal clique and the set-packing problems [18], the set-splitting problem [19], the set-cover problem and the problem of exact cover by 3-sets [20], subset-production [21], real DNA experiments of Knapsack problems [22], and the set-partition problem [23]. One po-tentially significant area of application for DNA algorithms is the breaking of en-cryption schemes [24,25,27]. In [28], the design and experimental implementation of DNA-based digital logic circuits were reported, and AND, OR, and NOT gates, signal restoration, amplification, feedback, and cascading were also demonstrated. Kari et al. [29] recalled a list of known properties of DNA languages which are free of certain types of undesirable bonds, and then introduced a general framework in which they can characterize each of these properties by a solution of a uniform for-mal language inequation.

Wu and Seeman [8] described computation using a DNA strand as the basic unit and they had used this unit to achieve the function of multiplication. From [9], it was reported that a second-generation deoxyribozyme-based automaton, MAYA-II, which plays a complete game of tic-tac-toe according to a perfect strategy, integrating 128 deoxyribozyme-based logic gates, 32 input DNA molecules, and 8 two-channel fluorescent outputs across 8 wells. The first direct observations of tile-based DNA self-assembly in solution using fluorescent nanotubes composed of a single tile was presented from [10]. From [11], it was found that with increasing range of correla-tions the capacity to distinguish between the species on the basis of this correlation profile is getting better and requires ever shorter sequence segments for obtaining a full species separation. From [12], it was shown that “open” tweezers exist in a

(4)

single conformation with minimal FRET efficiency. From [13], the first algorithm for calculating the partition function of an unpseudoknotted complex of multiple in-teracting nucleic acid strands was proposed. In [26], Zhang and Winfree presented an allosteric DNA molecule that, in its active configuration, catalyzes a noncovalent DNA reaction.

In [40], Kershner et al. described the use of electron-beam lithography and dry oxidative etching to create DNA origami-shaped binding sites on technologically useful materials, such as SiO2 and diamond-like carbon. In buffer with∼100 mM MgCl2, DNA origami binds with high selectivity and good orientation: 70–95% of sites have individual origami aligned with an angular dispersion (±1 s.d.) as low as

±10◦_{(on diamond-like carbon) or}_±20◦_{(on SiO2). In [41], Barish et al. presented a} programmable DNA origami seed that can display up to 32 distinct binding sites and demonstrate the use of seeds to nucleate three types of algorithmic crystals. In the simplest case, the starting materials are a set of tiles that can form crystalline ribbons of any width; the seed directs assembly of a chosen width with >90% yield. Increased structural diversity is obtained by using tiles that copy a binary string from layer to layer; the seed specifies the initial string and triggers growth under near-optimal con-ditions where the bit copying error rate is <0.2%. Increased structural complexity is achieved by using tiles that generate a binary counting pattern; the seed specifies the initial value for the counter. Self-assembly proceeds in a one-pot annealing reaction involving up to 300 DNA strands containing >17 kb of sequence information.

3 Motivation and our model

The Diffie–Hellman public-key cryptosystem [3] is an algorithm that converts input data to an unrecognizable encryption, and converts the unrecognizable data back into its original decryption form. The security of the Diffie–Hellman public-key cryp-tosystem is based on the difficulty to solve the problem of discrete logarithm. No method in a reasonable amount of time can be applied to solve the problem of dis-crete logarithm.

In the following subsection, we now describe our formal model of computation, within which we express the various algorithms that are combined to form the overall method for solving the problem of discrete logarithm. We first introduce it only in terms of abstract operations performed on multisets of strings over some alphabet Σ . The presented biological implementation of the model is introduced in Sect.8. Within our model, a computation starts and ends with zero or more multisets of strings. An algorithm is made of a sequence of operations performed on one or more multisets of strings. We note in passing that this model is sufficiently powerful to solve any problem in the complexity class NP [2,4,17,30,31].

3.1 Operations

Here we describe the basic legal operations on multisets (henceforth referred to as tubes) from [2,4,17]:

(5)

1. Extract. Given a tube T and a short single strand of DNA, s, the operation pro-duces two new tubes,+(T , s) and −(T , s). Tube +(T , s) is all of the molecules of DNA in T which contain s as a sub-strand and tube−(T , s) is all of the molecules of DNA in T which do not contain s as a sub-strand.

2. Merge. Given any n tubes T1 . . . Tn, the operation yields Merge(T1, . . . , Tn)= n

i=1Ti = T1∪ T2∪ . . . ∪ Tn. This implies that it is to pour any n tubes into one, without any change in the individual strands.

3. Discard. Given a tube T , the operation sets T to be an empty set (T ← ∅). 4. Detect. Given a tube T , the operation returns true if T includes at least one DNA

molecule (T = ∅), otherwise returns false.

5. Amplify. Given a tube T , the operation produces a number of identical copies, Ti, of tube T , and then discard(T ).

6. Concatenate(s1, s2). Given two strands of DNA, s1and s2, the operation returns a new strand of DNA, comprised of the concatenation of s1and s2. If s1is a null strand of DNA, return s2, and if s2is a null strand of DNA, return s1.

7. Append-head(T , s). Given a non-empty tube T and a short strand of DNA, s, the operation first creates a null tube, U , and then, in parallel, for each string ti∈ T finishes the following: T ← Merge(U, Concatenate(s, ti)). If T is initially empty, then after the operation is performed, T contains only s.

8. Read. Given a tube T , the operation is used to describe a single molecule, which is contained in tube T . Even if T contains many different molecules, each encoding a different set of bases, the operation can give an explicit description of exactly one of them.

3.2 Representation scheme

We now introduce our scheme for the representation of unsigned integers. Because this scheme consists of specific features required by the biological implementation of our algorithms, we denote Σ= {A, G, C, T }. An unsigned integer of k bits, e, is represented as a k-bit binary number, ek₋₁. . . e0, where the value of each bit ej is either 1 or 0 for 0≤ j ≤ k − 1. The bits ek−1and e0represent, respectively, the most significant bit and the least significant bit for e. From [30,31], for each bit ej, two

distinct 15 base value sequences over the alphabet Σ are designed. One represents

the value “0” for ej and the other represents the value “1” for ej. For the sake of convenience in our presentation, assume that e_j1denotes the value of ej to be 1 and

e_j0defines the value of ej to be 0.

4 Molecular solutions of discrete logarithms

In Sect.4.1, we introduce definition of discrete logarithm. In Sect.4.2, we describe a pseudo algorithm to solve the problem of discrete logarithm. In Sect.4.3, we propose a DNA-based algorithm to solve the problem of discrete logarithm.

(6)

4.1 The introduction of discrete logarithms

For any integer d and any positive integer n, there are unique integers s and r such that 0≤ r < n and d = s ∗ n + r. The value s = d/n is the quotient of the division. The value r= d mod n is the remainder of the division. We have that n|d if and only if d mod n= 0. Given a well-defined notion of the remainder one integer when divided by another, it is convenient to provide special notation to indicate equality of remainders. If (d mod n)= (b mod n), we write d ≡ b(mod n) and say that d is equivalent to b, modulo n. In other words, d≡ b(mod n) if d and b have the same remainder when divided by n. The integer can be divided into n equivalence classes according to their remainders modulo n. The equivalence class modulo n containing an integer d is[d]n= {d + h ∗ n, where h is an integer}. The set of all such equivalence classes is Zn= {[d]n: 0 ≤ d ≤ n − 1}. One often sees the definition

Zn= {0, 1, . . . , n − 1} [32].

The greatest common divisor of two integers d and n, not both zero, is the largest of the common divisors of d and n; it is denoted gcd(d, n). Two integers d and n are said to be relatively prime if their only common divisor is 1, that is, if gcd(d, n)= 1. Because the equivalence class of two integers uniquely determines the equivalence class of their product, thus we define multiplication modulo n, denoted∗n, as follows:

[d]n∗n[h]n= [d ∗h]n. Using the definition of multiplication modulo n, we define the multiplicative group modulo n as (Z∗n,∗n), where Z∗n= {[d]n∈ Zn: gcd(d, n)= 1}.

Just as it is natural to consider the multiples of a given element d modulo n, it is often natural to consider the sequence of power of d modulo n, where d∈ Zn:

d0, d1, d2, . . ., modulo n. Indexing from 0, its value in this sequence is d0mod n= 1, and the ith value is di mod n. We denoted as the subgroup of Z∗ngenerated by d, and we also denote ordn(d)(the “order of d, modulo n”) as the order of d in Z∗_n. For example,2 = {1, 2, 4} in Z∗₇, and ord7(2)= 3.

If ordn(M)is equal to the number of elements in Z∗_n, then every element in Z∗_nis a power of M modulo n, and we say that M is a primitive root or a generator of Z∗n [32]. For example, there is a primitive root modulo 7 and3 = {1, 3, 2, 6, 4, 5}. If Z∗_n possesses a primitive root, we say that the group Z∗_nis cyclic. If M is a primitive root of Z∗_nand C is any element of Z∗_n, then there exists an e such that Me≡ C(mod n). This e is called the discrete logarithm of C modulo n, to the base M. No method in a reasonable amount of time can be applied to solve the problem of discrete logarithm. The following method is used to figure out Me_{≡ C(mod n) [}_33].

Procedure Encryption(M, e, n)

(1) Let ek₋₁. . . e0be the binary representation of e. (2) C= 1.

(3) For i= k − 1 down to 0

(3a) Set C to the remainder of (C2)when divided by n. (3b) If ei= 1 then

(3c) Set C to the remainder of (C∗ M) when divided by n. EndFor

(4) Halt. Now C is the result of Me(mod n). EndProcedure

(7)

4.2 The pseudo algorithm for solving discrete logarithms

Assume that the length of e is k bits. Also suppose that e is represented as a k-bit binary number, ek₋₁. . . e0, where the value of each bit ejis either 1 or 0 for 0≤ j ≤

k− 1. The bits ek−1and e0represent the most significant bit and the least significant bit for e, respectively. The form of an expression, Me(mod n), can be transformed into another form: (. . . ((1∗Mek−1₍_{mod n))}2_∗Mek−2₍_{mod n))}2_∗Mek−3₍_{mod n) . . .)}2_∗

Me0(mod n). In the Diffie–Hellman public-key cryptosystem, n is a prime number. Therefore, in this paper, we also assume that n is a prime number. Because n is a prime number,M = {M0(mod n), M1(mod n) . . . Mn−2(mod n)}. That is to say that 0≤ e ≤ n − 2. The following pseudo algorithm is applied to solve the problem of discrete logarithm.

Method 1 Solving the problem of discrete logarithm.

(1) All of the computations for M0(mod n), M1(mod n) . . . Mn−2(mod n) are simul-taneously performed on a molecular computer.

(2) For any given C, from the result finished in Step (1), find Me≡ C(mod n). (3) Output(“discrete logarithm is:”, e).

EndMethod

Proof Step (1) in Method1is used to simultaneously complete all of the computa-tions for M0(mod n), M1(mod n) . . . Mn−2(mod n). This implies that the value of every element inM is determined after Step (1) is carried out.

Then, Step (2) in Method1is applied to search C among (n− 1) elements in M. When the value of the eth element inM is equal to C, e is the answer (discrete logarithm of C). Finally, Step (3) in Method1is employed to describe the answer. 4.3 The algorithm for computation of discrete logarithms

The procedure, Encryption(M, e, n), denoted in Sect.4.1, is used to finish compu-tation of an exponential modular operation. The following DNA algorithm is applied to implement the procedure, Encryption(M, e, n).

Algorithm 1 Implementing the procedure, Encryption(M, e, n) (0) T0← ∅; Tθ← ∅; Tn← ∅; T1← ∅. (1) Init(T0). (2) SelectDiscreteLogarithm(T0, Tθ). (3) MakeValue(Tn). (4) InitialValue(T0). (5) For j= k − 1 down to 0 (5a) ModularMultiplication(T0, Tn, (2∗ (k − 1 − j)) ∗ (4 ∗ k + 1) + 1, 2 ∗ (k − j ), C, C). (5b) T0= +(T0, e1j)and T1= −(T0, ej1). (5c) ModularMultiplication(T0, Tn, (2∗ (k − 1 − j) + 1) ∗ (4 ∗ k + 1) + 1, 2 ∗ (k− j) + 1, C, M).

(8)

(5d) For r= 0 to 4 ∗ k (5e) ReservedValue(T1, (2∗ (k − 1 − j) + 1) ∗ (4 ∗ k + 1) + r). EndFor (5f) AssignmentOperator(T1, (2∗ (k − 1 − j) + 1) ∗ (4 ∗ k + 1) + 1 + 4 ∗ k, 2 ∗ (k− j) + 1). (5g) T0=(T0, T1). EndFor EndAlgorithm

Theorem 1 From the steps in Algorithm1, the problem of discrete logarithm can be

solved.

Proof From the execution of Step (0), tubes T0, Tθ, Tn, and T1are set to empty tubes. On the execution of Step (1), it calls Init(T0)to construct solution space for 2k pos-sible discrete logarithms. This means that tube T0includes strands encoding 2k pos-sible discrete logarithms. Next, the execution of Step (2) calls SelectDiscreteLoga-rithm(T0, Tθ)to perform selection of legal discrete logarithms with its range is from 0 to n− 2. This implies that these legal discrete logarithms are encoded in tube T0. On the execution of Step (3), it calls MakeValue(Tn)to encode a prime number, n. This indicates that tube Tncontains a strand encoding it. Next, the execution of Step (4) calls InitialValue(T0)to finish the execution of Step (2) in the procedure, En-cryption(M, e, n). This is to say that the initial value for C is set to one.

Step (5) is a loop and is mainly used to finish the function of the only loop (Step (3)) in the procedure, Encryption(M, e, n). Next, the first execution of Step (5a) calls ModularMultiplication(T0, Tn, (2∗ (k − 1 − j)) ∗ (4 ∗ k + 1) + 1, 2 ∗ (k − j), C, C) to perform Step (3a) in Encryption(M, e, n). On the first execution of Step (5b), it employs the extract operation to form two tubes: T0 and T1. The first tube

T0 includes all of the strands that have ej = 1. The second tube T1 consists of all of the strands that have ej = 0. This indicates that the execution of the step finishes Step (3b) in Encryption(M, e, n). Because the j th bit of e encoded in tube T0 is one, next, the first execution of Step (5c) calls ModularMultiplica-tion(T0, Tn, (2∗ (k − 1 − j) + 1) ∗ (4 ∗ k + 1) + 1, 2 ∗ (k − j) + 1, C, M) to per-form Step (3c) in Encryption(M, e, n). Since the j th bit of e encoded in tube T1 is zero, Step (5d) is the loop and is mainly used to maintain the consistency of the intermediate value for Y . On the first execution of Step (5e), it calls Reserved-Value(T1, (2∗ (k − 1 − j) + 1) ∗ (4 ∗ k + 1) + r) to copy the current intermediate value of Y to the next intermediate value of Y . Repeat to execute Step (5e) until the value of r reaches (4∗ k). Next, the first execution of Step (5f) calls AssignmentOp-erator(T1, (2∗ (k − 1 − j) + 1) ∗ (4 ∗ k + 1) + 1 + 4 ∗ k, 2 ∗ (k − j) + 1) to perform updating of the value for C. Because the j th bit of e encoded in tube T1is zero, the updated value of C is still equal to the previous value.

On the first execution of Step (5g), it uses the merge operation to pour tube T1 into T0. Repeat execution of Steps (5a) through (5g) until the value of j is zero. After all of the steps are processed, every strand in tube T0 performs computation of an exponential modular operation, Me(mod n). This implies that Algorithm1performs Step (1) in the pseudo algorithm, Method1, in Sect.4.2. Therefore, the problem of discrete logarithm can be solved from those steps in Algorithm1.

(9)

5 Algorithm modules

We now describe, in detail, the various modules that are combined to form the overall DNA-based algorithm for solving the problem of discrete logarithm.

5.1 Construction of initial solution space for discrete logarithms

From [30,31], for every bit ej in discrete logarithm e, two distinct 15 base value sequences are designed. One represents the value “0” for ej and the other represents the value “1” for ej. For the sake of convenience in our presentation, assume that e1_j denotes the value of ej to be 1 and e0_j defines the value of ej to be 0. We first de-scribe the module Init(T0), which constructs an initial multiset of 2k binary strings, each representing a possible discrete logarithm. After the initial construction has been completed, we return a tube T0containing binary strings encoding the possible dis-crete logarithm 0 . . . 2k_{− 1.} Procedure Init(T0) (0) T1← ∅; T2← ∅. (0a) Append-head(T1, e1₀). (0b) Append-head(T2, e0₀). (0c) T0=(T1, T2). For j= 1 to k − 1 (1a) Amplify(T0, T1, T2). (1b) Append-head(T1, e1_j). (1c) Append-head(T2, e0_j). (1d) T0=(T1, T2). EndFor EndProcedure

Lemma 1 Solution space for 2kpossible discrete logarithms can be constructed from the algorithm Init(T0).

Proof The algorithm, Init(T0), is implemented via the amplify, append-head and

merge operations. On the execution of Step (0), it sets tubes T1 and T2 to empty tubes. Next, from the execution of Step (0a), it is used to append a DNA sequence, representing the value 1 for e0, onto the head of every strand in tube T1. This indicates that possible discrete logarithms consisting of the value 1 to the first bit appear in tube T1. Next, on the execution of Step (0b), it is also applied to append a DNA sequence, representing the value 0 for e0, onto the head of every strand in tube T2. This is to say that possible discrete logarithms including the value 0 to the first bit appear in tube T2. Next, Step (0c) is used to pour tube T1and T2into tube T0. This implies that DNA strands in tube T0contain DNA sequences of e0= 1 and e0= 0, tube T1= ∅, and tube T2= ∅.

(10)

Each time Step (1a) is used to amplify tube T0and to generate two new tubes, T1 and T2, which are copies of T0, tube T0becomes empty. Then, Step (1b) is applied to append a DNA sequence, representing the value 1 for ej, onto the head of every strand in tube T1. This means that possible discrete logarithms containing the value 1 to the (j+ 1)th bit appear in tube T1. Step (1c) is also employed to append a DNA sequence, representing the value 0 for ej, onto the head of every strand in tube T2. That implies that possible discrete logarithms containing the value 0 to the (j+ 1)th bit appear in tube T2. Next, Step (1d) is used to pour tube T1and T2into tube T0. This indicates that DNA strands in tube T0 include DNA sequences of ej = 1 and

ej = 0. After repeating execution of Steps (1a) to (1d), it finally produces tube T0 that consists of 2kDNA sequences representing 2kpossible discrete logarithms, tube

T1= ∅, and tube T2= ∅. Therefore, it is inferred that solution space for 2k possible

discrete logarithms can be constructed.

5.2 Solution space for Ordern(M)

Because Ordern(M) is equal to n− 1, suppose that n − 1 is represented as a k-bit binary number, θk₋₁. . . θ0, where the value of each bit θj is either 1 or 0 for 0≤ j ≤ k − 1. The bits θk−1and θ0are used to represent the most significant bit and the least significant bit for n− 1, respectively. From [30,31], for every bit θj, two distinct 15 base value sequences are designed. One represents the value “0” for

θj and the other represents the value “1” for θj. For the sake of convenience in our presentation, assume that θ_j1denotes the value of θjto be 1 and θ_j0defines the value of

θj to be 0. The following algorithm, SelectDiscreteLogarithm(T0, Tθ), is proposed to construct a DNA strand for encoding n− 1 and select legal discrete logarithms. Procedure SelectDiscreteLogarithm(T0, Tθ)

(1) For j= 0 to k − 1

(1a) Append-head(Tθ, θj). EndFor

(2) For j= k − 1 down to 0

(2a) T₀ON= +(T0, e1_j)and T₀OFF= −(T0, e1_j). (2b) T_θON= +(Tθ, θj1)and T

OFF

θ = −(Tθ, θj1). (2c) If (Detect(T_θON)== true) then

(2d) T₀==(T₀=, T₀ON)and T< 0 =

(T₀<, T₀OFF). Else

(2e) T₀>=(T₀>, T₀ON)and T₀==(T₀=, T₀OFF). EndIf (2f) Tθ=(T_θON, T_θOFF). (2g) Discard(T₀>). (2h) T0=(T0, T₀=). EndFor (3) Discard(T0). (4) T0=(T0, T₀<). EndProcedure

(11)

Lemma 2 The algorithm, SelectDiscreteLogarithm(T0, Tθ), can be applied to

en-code n− 1 and perform selection of legal discrete logarithms, with its range is from

0 to n− 2, from solution space.

Proof The algorithm, SelectDiscreteLogarithm(T0, Tθ), is implemented via the

ap-pend-head, extract, detect, merge and discard operations. The first loop in the

algo-rithm is mainly used to construct a DNA strand for n− 1. Each time Step (1a) is used, it appends a DNA sequence, encoding the value “1” or “0” of θj, onto the head of every strand in tube Tθ. After repeating execution of Step (1a), it finally produces tube

Tθ that includes a DNA strand encoding n− 1. Therefore, it is inferred that solution space for n− 1 can be constructed.

The second loop is mainly used to finish selection of legal discrete logarithms. Each execution of Step (2a) employs the extract operation to form two test tubes:

T₀ONand T₀OFF. The first tube T₀ONincludes all of the strands that have ej= 1. The second tube T₀OFFconsists of all of the strands that have ej= 0. On each execution of Step (2b), it uses the extract operation to form two test tubes: T_θONand T_θOFF. The first tube T_θON includes all of the strands that have θj = 1. The second tube T_θOFF consists of all of the strands that have θj= 0. Next, each execution of Step (2c) uses the detect operation to check whether there is any DNA sequence in tube T_θON. If it returns a true, this indicates that the value of the j th bit in n− 1 is one. On each execution of Step (2d), it uses the merge operations to pour T₀ON into T₀=and also to pour T₀OFFinto T₀<. If the detect operation in Step (2c) returns a false, this indicates that the value of the j th bit in n− 1 is zero. Hence, next, each execution of Step (2e) applies the merge operations to pour T₀ONinto T₀>and also to pour T₀OFFinto T₀=. On each execution of Step (2f), it applies the merge operations to pour T_θON and T_θOFF into Tθ. Then, because the encoded value of DNA strands in tube T>

0 is great than

n− 1, each execution of Step (2g) employs the discard operation to discard T₀>. On each execution of Step (2h), it applies the merge operations to pour T₀=into T0. After repeating execution of Steps (2a) to (2h), it finally produces tubes T0and T₀<. Tube

T0contains the encoded value of DNA strands to be equal to n− 1. Tube T₀<includes the encoded value of DNA strands to be less than n− 1. Next, on the execution of Step (3), it applies the discard operation to discard T0 that contains the encoded value of a DNA strand to be equal to n− 1. Finally, the execution of Step (4) uses the

merge operations to pour T₀<into T0. This indicates that DNA strands encoding legal discrete logarithms are reserved in T0. Therefore, it is inferred that selection of legal discrete logarithms with its range is from 0 to n− 2 can be performed. 5.3 Solution space for module n

Assume that the length of n denoted in Sect.3.1is k bits. Also suppose that n is represented as a k-bit binary number, nk₋₁. . . n0, where the value of each bit nj is either 1 or 0 for 0≤ j ≤ k − 1. The bits nk−1and n0represent the most significant bit and the least significant bit for n, respectively. From [30,31], for every bit nj, two distinct 15 base value sequences are designed. One represents the value “0” for

nj and the other represents the value “1” for nj. For the sake of convenience in our presentation, assume that n1_j denotes the value of nj to be 1 and n0_j defines the value

(12)

of nj to be 0. The following algorithm, MakeValue(Tn), is proposed to construct a DNA strand for encoding n.

Procedure MakeValue(Tn)

For j= 0 to k − 1

(1a) Append-head(Tn, nj). EndFor

EndProcedure

Lemma 3 Solution space of n can be constructed from the algorithm, MakeValue(Tn).

Proof Similar to Lemmas1and2.

5.4 Solution space for a primitive root M and the result of an exponential modular operation C

Suppose that the length of a primitive root M for Zn∗is k bits. Also assume that M is represented as a k-bit binary number, mk₋₁. . . m0, where the value of each bit mj is either 1 or 0 for 0≤ j ≤ k − 1. The bits mk−1and m0represent the most significant bit and the least significant bit for M, respectively. From [30,31], for every bit mj, two distinct 15 base value sequences are designed. One represents the value “0” for

mj and the other represents the value “1” for mj. For the sake of convenience in our presentation, assume that m1_jdenotes the value of mjto be 1 and m0_jdefines the value of mjto be 0.

Assume that the length of C, the result of an exponential modular operation de-noted in Sect. 4.1, is k bits. From the procedure Encryption(M, e, n), C is finally obtained after at most updating (2∗ k + 1) times of the value for C. Therefore, sup-pose that C is represented as a k-bit binary number, ca,k₋₁. . . ca,0, where the value of each bit ca,j is either 1 or 0 for 1≤ a ≤ (2 ∗ k + 1) and 0 ≤ j ≤ k − 1. The bits,

ca,k−1and ca,0, represent the most significant bit and the least significant bit for C, respectively. The first k-bit binary number, c1,k₋₁. . . c1,0, is used to represent the ini-tial value to C. The last k-bit binary number, c(2_{∗k+1),k−1}. . . c(2_∗k+1),0, is used to represent the final result of C. For other k-bit binary numbers, they are applied to represent the intermediate computed form of C. From [30,31], for every bit ca,j, two

distinct 15 base value sequences were designed. One represents the value “0” for ca,j and the other represents the value “1” for ca,j. For the sake of convenience in our presentation, assume that c1_a,j denotes the value of ca,j to be 1 and c_a,j0 defines the value of ca,j to be 0. The following algorithm is used to construct solution space for the initial value for C and the primitive root M.

Procedure InitialValue(T0) (1) For j= 0 to k − 1 (1a) Append-head(T0, mj). EndFor (2) Append-head(T0, c_1,01 ). (3) For j= 1 to k − 1

(13)

(3a) Append-head(T0, c0_1,j). EndFor

EndProcedure

Lemma 4 Solution space for the initial value of C and the primitive root M can be

constructed from the algorithm, InitialValue(T0).

Proof Similar to Lemmas1and2.

5.5 The algorithm for computation of a modular multiplication

The procedure, Encryption(M, e, n), denoted in Sect. 4.1, is used to finish com-putation of an exponential modular operation. In the procedure, it uses successive operations of square and multiplication to perform the exponential modular oper-ation. We now give details of the ModularMultiplication(T0, Tn, f, a, α, β) mod-ule used by the main algorithm. The following DNA-based algorithm, Modular-Multiplication(T0, Tn, f, a, α, β), is applied to perform all of the steps to a mod-ular multiplication. This implies that Steps (3a) and (3c) in the procedure, Encryp-tion(M, e, n), are performed through the following DNA-based algorithm, Modular-Multiplication(T0, Tn, f, a, α, β). The two parameters, α and β, in ModularMulti-plication(T0, Tn, f, a, α, β)represent the multiplicand and the multiplier of a modu-lar multiplication. Assume that β_j1is applied to represent the value of “1” for the j th bit of the multiplier (β).

Procedure ModularMultiplication(T0, Tn, f, a, α, β) (1) InitialSet(T0, f ). (2) For j= k − 1 down to 0 (2a) ParallelLeftShifter(T0, f+ (k − 1 − j) ∗ 4). (2b) ParallelComparator(T0, Tn, T₀>, T₀=, T₀<, f + (k − 1 − j) ∗ 4 + 1). (2c) T0=(T₀>, T₀=). (2d) BinaryParallelSubtractor(T0, f+ (k − 1 − j) ∗ 4 + 1). (2e) ReservedValue(T< 0 , f+ (k − 1 − j) ∗ 4 + 1). (2f) T0=(T0, T₀<). (2g) T0= +(T0, β_j1)and T1= −(T0, β_j1). (2h) If (Detect(T0)== true) then

(2i) BinaryParallelAdder(T0, f + (k − 1 − j) ∗ 4 + 2, a).

(2j) ParallelComparator(T0, Tn, T0>, T0=, T0<, f+ (k − 1 − j) ∗ 4 + 3). (2k) T0=(T₀>, T₀=). (2l) BinaryParallelSubtractor(T0, f + (k − 1 − j) ∗ 4 + 3). (2m) ReservedValue(T₀<, f+ (k − 1 − j) ∗ 4 + 3). (2n) T0=(T0, T0<). EndIf

(2o) If (Detect(T1)== true) then

(2p) ReservedValue(T1, f + (k − 1 − j) ∗ 4 + 2).

(14)

rithms [3] known at present. Shor’s quantum factoring and discrete logarithm algo-rithm [39] includes that the two main components, modular exponentiation (com-putation of ax mod n) and the inverse quantum Fourier transform (QFT) take only

O(k3)operations. In this article, Our molecular discrete logarithm algorithm demon-strates theoretically how basic biological operations can be used to solve the problem of discrete logarithm with O(k3)biological operations. Both of Shor’s factoring and discrete logarithm algorithm and our discrete logarithm algorithm need to simulta-neously deal with 21024 bit information to find the discrete logarithm of 1024 bits used in the current Diffie–Hellman public-key cryptosystem. However, due to cur-rent many technical difficulties, therefore, the two algorithms curcur-rently do not in fact find the discrete logarithm of 1024 bits. This implies that if a quantum computer and a molecular computer are really constructed in the future (perhaps after many years), then Shor’s factoring and discrete logarithm algorithm and our discrete logarithm algorithm have very high feasibility for solving the problem of discrete logarithm.

In [42], it is demonstrated that the difficult problem of elliptic curve discrete log-arithms can be solved on a DNA-based computer, and the application of DNA com-puting is proposed in another popular cryptosystem, ECC, which is more complex and has more challenge in cryptoanalysis. In [42], solving elliptic curve discrete log-arithm takes a series of steps that is polynomial in the input size, and it has also been shown that humans’ complex mathematical operations can be performed directly with basic biological operations.

Adleman [1] indicated that at a time unit 2ncombination states can be simultane-ously processed by means of biological operations, but just one state can be processed in a digital computer. Therefore, Adleman [1] also pointed out that a digital computer will take exponential time to complete the digital-computer simulation of biological algorithms. This implies that the digital-computer simulation of the proposed biolog-ical algorithms for breaking public-key cryptosystems is perhaps inefficient.

Acknowledgements The authors would like to give many thanks to Dr. Amos who is the author of the 17th reference for offering valuable information on Sect.8entitled “Biological Implementation”.

References

1. Feynman RP (1961) In: Gilbert DH (ed) Minaturization. Reinhold, New York, pp 282–296 2. Adleman L (1994) Molecular computation of solutions to combinatorial problems. Science

266(11):1021–1024

3. Diffie W, Hellman M (1976) New directions in cryptography. IEEE Trans Inf Theory 22(6):644–654 4. Adleman L, Rothemund PWK, Roweis S, Winfree E (1999) On applying molecular computation to

the data encryption standard. In: The 2nd annual workshop on DNA computing, Princeton University. DIMACS: series in discrete mathematics and theoretical computer science. Am Math Soc, Providence, pp 31–44

5. Guo M, Chang W-L, Ho M, Lu J, Cao J (2005) Is optimal solution of every NP-complete or NP-hard problem determined from its characteristic for DNA-based computing. BioSystems 80(1):71–82 6. Muskulus M, Besozzi D, Brijder R, Cazzaniga P, Houweling S, Pescini D, Rozenberg G (2006) Cycles

and communicating classes in membrane systems and molecular dynamics. Theor Comput Sci 372(2– 3):242–266

7. Reif JH, LaBean TH (2007) Autonomous programmable biomolecular devices using self-assembled DNA nanostructures. Commun ACM 50(9):46–53

(15)

9. Macdonald J, Li Y, Sutovic M, Lederman H, Pendri K, Lu W, Andrews BL, Stefanovic D, Sto-janovic MN (2006) Medium scale integration of molecular logic gates in an automaton. Nano Lett 6(11):2598–2603

10. Ekani-Nkodo A, Kumar A, Fygenson DK (2004) Joining and scission in the self assembly of nano-tubes from DNA tiles. Phys Rev Lett 93:268301

11. Dehnert M, Helm WE, Hütt M-Th (2006) Informational structure of two closely related eukaryotic genomes. Phys Rev E 74:021913

12. Müller BK, Reuter A, Simmel FC, Lamb DC (2006) Single-pair FRET characterization of DNA tweezers. Nano Lett 6:2814–2820

13. Dirks RM, Bois JS, Schaeffer JM, Winfree E, Pierce NA (2007) Thermodynamic analysis of interact-ing nucleic acid strands. SIAM Rev 49(1):65–88

14. Lipton R (1995) DNA solution of hard computational problems. Science 268:542–545

15. Yeh C-W, Chu C-P, Wu K-R (2006) Molecular solutions to the binary integer programming problem based on DNA computation. Biosystems 83(1):56–66

16. Guo M, Ho M, Chang W-L (2004) Fast parallel molecular solution to the dominating-set problem on massively parallel bio-computing. Parallel Comput 30(9–10):1109–1125

17. Amos M (2005) Theoretical and experimental DNA computation. Springer, Berlin

18. Ho M, Chang W-L, Guo M, Yang LT (2004) Fast parallel solution for set-packing and clique problems by DNA-based computing. IEICE Trans Inf Syst E-87D(7):1782–1788

19. Chang W-L, Guo M, Ho M (2004) Towards solution of the set-splitting problem on gel-based DNA computing. Future Gener Comput Syst 20(5):875–885

20. Chang W-L, Guo M (2003) Solving the set-cover problem and the problem of exact cover by 3-sets in the Adleman-Lipton’s model. BioSystems 72(3):263–275

21. Ho M (2005) Fast parallel molecular solutions for DNA-based supercomputing: the subset-product problem. BioSystems 80:233–250

22. Henkel CV, Bäck T, Kok JN, Rozenberg G, Spaink HP (2007) DNA computing of solutions to knap-sack problems. Biosystems 88(1–2):156–162

23. Chang W-L (2007) Fast parallel DNA-based algorithms for molecular computation: the set-partition problem. IEEE Trans Nanobiosci 6(1):346–353

24. Chang W-L, Ho M, Guo M (2005) Fast parallel molecular algorithms for DNA-based computation: factoring integers. IEEE Trans Nanobiosci 4(2):149–163

25. Boneh D, Dunworth C, Lipton RJ (1996) Breaking DES using a molecular computer. In: Proceedings of the 1st DIMACS workshop on DNA based computers, 1995. DIMACS series in discrete mathe-matics and theoretical computer science, vol 27. Am Math Soc, Providence, pp 37–66

26. Zhang DY, Winfree E (2008) Dynamic allosteric control of noncovalent DNA catalysis reactions. J Am Chem Soc 130:13921–13926

27. Chang W-L, Ho M, Guo M (2004) Molecular solutions for the subset-sum problem on DNA-based supercomputing. BioSystems 73(2):117–130

28. Seelig G, Soloveichik D, Zhang D-Y, Winfree E (2006) Enzyme-free nucleic acid logic circuits. Sci-ence 314(5805):1585–1588

29. Kari L, Konstantinidis S, Sosík P (2005) On properties of bond-free DNA languages. Theor Comput Sci 334(1–3):131–159

30. Braich RS, Johnson C, Rothemund PWK, Hwang D, Chelyapov N, Adleman LM (2001) Solution of a satisfiability problem on a gel-based DNA computer. In: Proceedings of the 6th international conference on DNA computation. Lecture notes in computer science series, vol 2054. Springer, Berlin, pp 27–42

31. Adleman LM, Braich RS, Johnson C, Rothemund PWK, Hwang D, Chelyapov N (2002) Solution of a 20-variable 3-SAT problem on a DNA computer. Science 296(5567):499–502

32. Koblitz N (1987) A course in number theory and cryptography. Springer, Berlin. ISBN:0387942939 33. Rivest RL, Shamir A, Adleman L (1978) A method for obtaining digital signatures and public-key

cryptosystem. Commun ACM 21:120–126

34. Blakley GR A computer algorithm for calculating product AB modulo M . IEEE Trans Comput c-32(5):497–500

35. Adams RL, Knowler JT, Leader DP (1986) The biochemistry of the nucleic acids, 10th edn. Chapman & Hall, London

36. Watson J, Gilman M, Witkowski J, Zoller M (1992) Recombinant DNA, 2nd edn. Scientific American Books

37. Breslauer K, Frank R, Blocker H, Marky L (1986) Predicting DNA duplex stability from the base sequence. Proc Natl Acad Sci 3746–3750

(16)

38. Brown T (1993) Genetics: a molecular approach. Chapman & Hall, London

39. Shor PW (1997) Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Comput 26(5):1484–1509

40. Kershner RJ, Bozano LD, Micheel CM, Hung AH, Fornof AR, Cha JN, Rettner CT, Bersani M, Frommer J, Rothemund PWK, Wallraff GM (2009) Placement and orientation of individual DNA shapes on lithographically patterned surfaces. Nat Nanotechnol 16:557–561

41. Barish RD, Schulman R, Rothemund PWK, Winfree E (2009) An information-bearing seed for nu-cleating algorithmic self-assembly. PNAS 106:6054–6059

42. Li K, Zou S, Xv J (2008) Fast parallel molecular algorithms for DNA-based computation: solv-ing the elliptic curve discrete logarithm problem over GF(2n). J Biomed Biotechnol 2008:518093. doi:10.1155/2008/518093

Weng-Long Chang received the Ph.D. degree in Computer Science

and Information Engineering from National Cheng Kung University, Taiwan, Republic of China, in 1999. He is currently a full Professor at the Department of Computer Science and Information Engineering in National Kaohsiung University of Applied Sciences. His researching interests include quantum algorithms, adiabatic quantum algorithms, DNA-based algorithms, and languages and compilers for parallel com-puting.

Shu-Chien Huang received the Ph.D. degree in Computer Science

and Information Engineering from National Cheng Kung University, Taiwan, Republic of China, in 1999. He is currently an assistant Pro-fessor at the Department of Computer Science in Pingtung Univer-sity of Education. His researching interests include image processing, quantum algorithms, and DNA-based algorithms.

(17)

Kawuu Weicheng Lin received the B.Sc. from the Department of

Computer Science and Information Engineering, National Taiwan University (NTU), Taiwan, 1999, and received his Ph.D. form the De-partment of Computer Science and Information Engineering, National Cheng-Kung University (NCKU), Taiwan, 2006. Since August 2007, he has been an assistant Professor at the Department of Computer Sci-ence and Information Engineering, National Kaohsiung University of Applied Sciences (KUAS), Taiwan. His research interests include data mining and its applications, sensor technologies, and parallel and dis-tributed computing. He is a member of Phi Tau Phi honorary society, and has won the Phi Tau Phi Scholastic Honor in 2006.

Michael (Shan-Hui) Ho received his Ph.D. degree in Information

Systems and Management Science from University of Texas at Austin, USA, in 1988. He is a full Professor of both Computer Center and In-stitute of Electric Engineering in National Taipei University. His re-search interests are Bioinformatics Computing, Parallel Process and Computing, Software Engineering, and Computation.