Confronting two-pair primer design for enzymefree SNP genotyping based on a genetic algorithm

(1)

M E T H O D O L O G Y A R T I C L E

Open Access

Confronting two-pair primer design for

enzyme-free SNP genotyping based on a genetic

algorithm

Cheng-Hong Yang

1,2

, Yu-Huei Cheng

1

, Li-Yeh Chuang

3*

, Hsueh-Wei Chang

4,5,6,7*

Abstract

Background: Polymerase chain reaction with confronting two-pair primers (PCR-CTPP) method produces

allele-specific DNA bands of different lengths by adding four designed primers and it achieves the single nucleotide

polymorphism (SNP) genotyping by electrophoresis without further steps. It is a time- and cost-effective SNP

genotyping method that has the advantage of simplicity. However, computation of feasible CTPP primers is still

challenging.

Results: In this study, we propose a GA (genetic algorithm)-based method to design a feasible CTPP primer set to

perform a reliable PCR experiment. The SLC6A4 gene was tested with 288 SNPs for dry dock experiments which

indicated that the proposed algorithm provides CTPP primers satisfied most primer constraints. One SNP

rs12449783 in the SLC6A4 gene was taken as an example for the genotyping experiments using electrophoresis

which validated the GA-based design method as providing reliable CTPP primer sets for SNP genotyping.

Conclusions: The GA-based CTPP primer design method provides all forms of estimation for the common primer

constraints of PCR-CTPP. The GA-CTPP program is implemented in JAVA and a user-friendly input interface is freely

available at http://bio.kuas.edu.tw/ga-ctpp/.

Background

Genotyping is a common technique used in association

studies of diseases and cancers. Although many

high-throughput platforms of single nucleotide polymorphism

(SNP) genotyping, such as SNP array [1] and real-time

PCR using TaqMan probes [2], have been introduced,

most laboratories still validate SNP or novel mutation

by PCR-restriction fragment length polymorphism

(RFLP) genotyping [3-6] because this method is

inex-pensive for small-scale genotyping. One shortcoming of

PCR-RFLP is its long digestion time (usually in 2-3

hours) for restriction enzymes [7,8].

Recently, a restriction enzyme-free SNP genotyping

technique called

“PCR with confronting two-pair primers

(PCR-CTPP)” was developed [9-12]. It has been applied

successfully to at least 30 different SNP genotypings.

For example, interleukin-1B (IL-1B) C-31T, interleukin-2

(IL-2) -330G, beta2-adrenergic receptor (beta2-AR)

Gln27Glu, and aldehyde dehydrogenase 2 (ALDH2) were

genotyped by PCR-CTPP for association studies with

smoking behavior [13], pylori-induced gastric atrophy

[14], severe coronary artery disease [15], and esophageal

cancer risk [16], respectively. There is no doubt that the

PCR-CTPP method is suitable and reliable for most cases

of SNPs. This method considerably lowers the need to

consume restriction enzymes. However, the criteria for the

PCR-CTPP primers are only tolerant of a small difference

in melting temperature (T

m-diff

) between the four primers

in the PCR-CTPP method [10]. Moreover, typical primer

design constraints also need to be considered, such as

pri-mer length, pripri-mer length difference, PCR product length,

GC proportion, melting temperature (T

m

), GC clamp,

dimer (including cross-dimers and self-dimers), hairpin

structure, and specificity. Hence, the computational

requirements needed to improve the primer design with

PCR-CTTP are rather high.

* Correspondence: [email protected]; [email protected]

3

Department of Chemical Engineering & Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung, Taiwan

4

Department of Biomedical Science and Environmental Biology, Kaohsiung Medical University, Kaohsiung, Taiwan

Full list of author information is available at the end of the article

© 2010 Yang et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

(2)

To design CTPP primers with many corresponding

constraints, we introduce a genetic algorithm (GA)

[17,18] to improve the design of CTPP primer sets. GA

is a stochastic search algorithm modeled on the process

of natural selection underlying biological evolution [19].

It constitutes a randomized search and an optimization

technique that derives its working principle from natural

genetics. Since GA computation is similar in nature to

the evolution of the species, it best fits DNA behavior

associated with SNP interaction [20] and general primer

design [21]. The evolutionary computations involved,

such as selection, crossover and mutation, are effective

in achieving optimal solutions for many CTPP primer

sets. After each run, chromosomes in a GA share

infor-mation with each other and the superior solutions based

on a fitness rule are refined from generation to

genera-tion. Therefore, CTPP primers obeying the typical

pri-mer design constraints described above can be mined.

Methods

Problem formulation

The CTPP primer design problem can be described as

follows. Let T

D

be a template DNA sequence, which is

composed of nucleotide codes with an identified SNP. T

D

is defined by:

T

B

i

B

D i i

=

≤

≤ ∃

∈

{ |

,

!

is the index of DNA sequence,

1 

IUPA

C

C code of SNP}

(1)

where B

i

is the regular nucleotide code (A, T, C, or G)

mixed with a single IUPAC code of SNP (M, R, W, S, Y,

K, V, H, D, B or N) (is the existence and uniqueness).

For the target SNP, we focused only on true SNPs

described in dbSNP [22] of NCBI, i.e., deletion/insertion

polymorphisms (DIPs) and multi-nucleotide

polymorph-isms (MNPs) are not included.

The CTPP primer design requires two pairs of short

sequences which are constraining in T

D

based on a

defined SNP site as illustrated (Figure 1). The forward

primer 1 (P

f1

) is a short sense sequence in the upstream

(5

’ end) of a defined SNP site for some distances; the

reverse primer 1 (P

r1

) is a short antisense sequence

which contains a nucleotide (the minor allele of the

defined SNP site) located at its 3’ end; the forward primer

2 (P

f2

) is a short sense sequence which contains a

nucleo-tide (the major allele of the defined SNP site) located at

its 3’ end; and the reverse primer 2 (P

r2

) is the antisense

sequence in the upstream of a defined SNP site for some

distances. These four primers are defined as follows:

P

r1

=

{

B

i

|

i

is the index of

T

D

,

R

s1

≤

i

≤

R

e1

}

(2)

P

r1

=

{

B

_i

|

i

is the index of

T

D

,

R

s1

≤

i

≤

R

e1

}

(3)

P

f2

=

{ |

B i

i

is the index of

T

D

,

F

s2

≤

i

≤

F

e2

}

(4)

P

r2

=

{

B

_i

|

i

is the index of

T

D

,

R

s2

≤

i

≤

R

e2

}

(5)

where both P

f1

/P

r1

and P

f2

/P

r2

are two primer pairs of

PCR-CTPP. F

s1

vs. F

e1

and R

s1

vs. R

e1

indicate the start

index vs. the end index of P

f1

and P

r1

in T

D

, respectively.

F

s2

vs. F

e2

and R

s2

vs. R

e2

indicate the start index vs. the

end index of P

f2

and P

r2

in T

D

, respectively. B

i

is the

complementary nucleotide of B

i

, which is described in

formula (1). For example, if B

i

=

‘A’, then B

i

=

‘T’; if B

i

=

‘C’, then B

i

=

’G’, and vice versa.

The target SNP site is defined at the end positions of

P

f2

and P

r1

, which are indicated by the symbols F

e2

and

R

e1

, respectively. As described in Figure 1, a vector (v)

with F

l1

, P

l1

, R

l1

, F

l2

, P

l2

and R

l2

is essential to design

the CTPP primer sets. This vector is defined as follows:

P

_v

=

(

F

_l₁

,

P

_l₁

,

R

_l₁

,

F

_l₂

,

P

_l₂

,

R

_l₂

)

(6)

F

l1

, P

l1

, R

l1

, F

l2

, P

l2

and R

l2

represent the lengths of

forward primer 1, PCR product between P

f1

and P

r1

,

reverse primer 1, forward primer 2, PCR product

between P

f2

and P

r2

and reverse primer 2, respectively.

Consequently, the forward and reverse primers can be

acquired from P

v

, which is the prototype of a

chromo-some in GA and is used to perform evolutionary

com-putations as described in the following sections.

Definition of the fitness function

The regular primer design constraints are used as values

to design a fitness function to minimize the fitness

value. The fitness function is defined as follows:

Fitness P

Len

P

GC

P

GC

P

v diff v proportion v clamp v

(

)

*(

(

)

(

)

(

))

=

+

3

1

5 m

0

0 *(

(

)

*(

(

)

dimer P

hairpin P

specificity P

T

P

v v v v

+

( )

+

( )

+

+ TT

P

vg T

P

PCR

P

diff v diff v v

m

1 m

6 len

(

))

*

_

(

)

*

+

( )

00

0 (7)

The weights (3, 10, 50, 60 and 100) of the fitness

function are applied to estimate the importance of the

primer constraints. These weights are set according to

the experiential conditions for PCR-CTPP. They also

accept adjustment based on the user experimental

requirements.

Primer length

The feasible primer length for a PCR experiment is set

between 16 and 28 bp. For longer primers, the T

m

is

Yanget al. BMC Bioinformatics 2010, 11:509 http://www.biomedcentral.com/1471-2105/11/509

(3)

increased whereas the T

m

of relatively short primers is

decreased. Accordingly, primers which are neither too

long nor too short are suitable. We have limited the

random values of F

l1

, R

l1

, F

l2

and R

l2

in an appropriate

range; therefore, the primer length estimation is not

considered to be joined to the fitness function.

A length difference (Len

diff

) of less than or equal to 3 bp

between the F

l1

/R

l1

, F

l2

/R

l2

, and F

l1

/R

l2

primer sets is

con-sidered optimal. The primer length difference function is

defined as follows:

Len

P

defect

value

F

R

defec

diff v l l

( )

_

=

−

≤

3

1 1

if ABS (

)

3,

then

tt

value

F

R

defect

value

l l

_

−

≤

−

1

2 2

if ABS (

)

3,

then

if ABS (

)

3,

then

return

F

R

defect

value

defect

value

l1 l2

1 −

≤

−

⎧

⎨

⎪

_

⎪⎪

⎪

⎩

⎪

(8)

where Len

diff

(P

v

) has a maximal fitness value of 3; the

fitness value is decreased when the length difference

between a primer pair is less than or equal to 3 bp. ABS

represents the absolute value.

GC content and GC clamp

The function GC%(P) is proposed to represent the ratio

of G and C nucleotides appearing in a primer:

GC

P

G

P

C

P

number number

%( )

( )

|

=

+

(9)

where P represents a primer and |P| represents the

length of primer P; G

number

(P) and C

number

(P) represent

the numbers of the nucleotides G and C, respectively.

In general primer design, the typical GC proportion

constraint is set between 40% and 60%. However, the

designed CTPP primers contain the target SNP to limit

the range of the GC proportion. To relax this constraint,

the constraint of GC proportion in a primer is adjusted

to between 20% and 80%. Function GC

proportion

(P

v

) is

pro-posed with a maximal fitness value of 4 to lead the GC%

(P) of CTPP primers corresponding to this constraint:

GC

P

defect

value

GC

P

d

proportion v f

( )

_

%(

)

,

=

≤

4

80

1

if 20

then eefect

value

GC

P

defect

value

_

%(

)

,

_

−

≤

−

1

80

1 if 20

then

if 2

1 r

0

0 then

if 20

2

≤

−

≤

GC

P

defect

value

GC

P

f

%(

)

,

_

%(

)

,

2

80

1

80

r

tthen

return

defect

value

defect

value

_

−

⎧

⎨

⎪

⎪⎪

⎩

⎪

1 (10)

To meet the presence of G or C nucleotides at the 3’

terminal of a primer to ensure a tight localized

hybridi-zation bond, the function GC

clamp

(P

v

) is proposed with

the maximal fitness value of 4 as follows:

GC P defect value P clamp v f ( ) _ = = 4 1 if 3’ end of is ’G’ or ’C’,, _ , then if 3’ end of 1 is ’G’ or ’C’ defect value P − 1 r then if 3’ end of is ’G’ or ’C’ t defect value Pf _ , − 1 2 h hen if 3’ end of is ’G’ or ’C’ th 2 defect value P _ , − 1 r een return defect value defect value _ _ − ⎧ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎪ ⎩ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1

(11)

Figure 1 Parameters of the template DNA and the CTPP primer set. Symbols indicate: F: Forward primer; R: Reverse primer; s: Start nucleotide position; e: End nucleotide position; P: Length of PCR product using a primer set (F/R); l: Length of primer or product;ι: Length of DNA template;δ1: Length from the Rs1end to downstream of template DNA;δ2: Length from Fs2to the downstream end of template DNA.

(4)

the PCR-CTPP is less developed for its computational

tool providing PCR-CTPP primer design. A novel

strat-egy for PCR-CTPP primer design has been introduced

in this paper and the freely available web server

imple-mented with this method was also constructed. With

experimental validation, our proposed GA-based

method is a useful algorithm to design feasible CTPP

primers and it conforms to most of the PCR-CTPP

constraints.

Availability and requirements

Project name: GA-CTPP: Confronting two-pair primer

design using genetic algorithm.

Project home page: http://bio.kuas.edu.tw/ga-ctpp/.

Operating system(s): Operating systems free with web

browser.

Programming language: Java.

Other requirements: Java 1.5.

License: none for academic users. For any restrictions

regarding the use by non-academics please contact the

corresponding author.

Additional material

Additional file 1:’The differences between our previous publication in BIBE 2009 conference [34] and this study_’.

Additional file 2:’The performances for primer design using our proposed GA-CTPP algorithm between different population sizes of Dejong and Spears’s parameter settings’.

Acknowledgements

This work is partly supported by the National Science Council in Taiwan under grant NSC97-2622-E-151-008-CC2, 2221-E-214-050-MY3, NSC96-2311-B037-002, NSC96-2622-E214-004-CC3, NSC2221-E-151-040-, NSC 98-2622-E-151-001-CC2 and the funds KMU-EM-99-1.4 and DOH99-TD-C-111-002. We also thank for the technical support from Mr. Ming-Tz Tsai. Author details

1_{Department of Electronic Engineering, National Kaohsiung University of}

Applied Sciences, Kaohsiung, Taiwan.2Department of Network Systems, Toko University, Chiayi, Taiwan.3_{Department of Chemical Engineering & Institute}

of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung, Taiwan.4_{Department of Biomedical Science and Environmental Biology,}

Kaohsiung Medical University, Kaohsiung, Taiwan.5Graduate Institute of Natural Products, College of Pharmacy, Kaohsiung Medical University, Kaohsiung, Taiwan.6_{Center of Excellence for Environmental Medicine,}

Kaohsiung Medical University, Kaohsiung, Taiwan.7Cancer Center, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan.

Authors’ contributions

C-HY coordinated and oversaw this study. Y-HC participated in the design of the algorithm, and wrote the program and the manuscript. L-YC provided the biochemistry background and introduced the bioinformatics needed for primer design. H-WC performed and verified the PCR experiment, and modified the manuscript. All authors read and approved the final manuscript.

Received: 9 January 2010 Accepted: 13 October 2010 Published: 13 October 2010

References

1. Jasmine F, Ahsan H, Andrulis IL, John EM, Chang-Claude J, Kibriya MG: Whole-genome amplification enables accurate genotyping for microarray-based high-density single nucleotide polymorphism array. Cancer Epidemiol Biomarkers Prev 2008, 17(12):3499-3508.

2. Hui L, DelMonte T, Ranade K: Genotyping using the TaqMan assay. Curr Protoc Hum Genet 2008, Chapter 2(Unit 2):10.

3. Chang HW, Yang CH, Chang PL, Cheng YH, Chuang LY: SNP-RFLPing: restriction enzyme mining for SNPs in genomes. BMC Genomics 2006, 7:30.

4. Lin GT, Tseng HF, Yang CH, Hou MF, Chuang LY, Tai HT, Tai MH, Cheng YH, Wen CH, Liu CS, et al: Combinational polymorphisms of seven CXCL12-related genes are protective against breast cancer in Taiwan. OMICS 2009, 13(2):165-172.

5. Yen CY, Liu SY, Chen CH, Tseng HF, Chuang LY, Yang CH, Lin YC, Wen CH, Chiang WF, Ho CH, et al: Combinational polymorphisms of four DNA repair genes XRCC1, XRCC2, XRCC3, and XRCC4 and their association with oral cancer in Taiwan. J Oral Pathol Med 2008, 37(5):271-277. 6. Aomori T, Yamamoto K, Oguchi-Katayama A, Kawai Y, Ishidao T, Mitani Y,

Kogo Y, Lezhava A, Fujita Y, Obayashi K, et al: Rapid single-nucleotide polymorphism detection of cytochrome P450 (CYP2C9) and vitamin K epoxide reductase (VKORC1) genes for the warfarin dose adjustment by the SMart-amplification process version 2. Clin Chem 2009, 55(4):804-812. 7. Chuang LY, Yang CH, Tsui KH, Cheng YH, Chang PL, Wen CH, Chang HW:

Restriction enzyme mining for SNPs in genomes. Anticancer Res 2008, 28(4A):2001-2007.

8. NCBI: Restriction Fragment Length Polymorphism. [http://www.ncbi.nlm. nih.gov/genome/probe/doc/TechRFLP.shtml], (RFLP) (accessed September 2009).

9. Hamajima N, Saito T, Matsuo K, Kozaki K, Takahashi T, Tajima K: Polymerase chain reaction with confronting two-pair primers for polymorphism genotyping. Jpn J Cancer Res 2000, 91(9):865-868.

10. Hamajima N, Saito T, Matsuo K, Tajima K: Competitive amplification and unspecific amplification in polymerase chain reaction with confronting two-pair primers. J Mol Diagn 2002, 4(2):103-107.

11. Maruyama C, Suemizu H, Tamamushi S, Kimoto S, Tamaoki N, Ohnishi Y: Genotyping the mouse severe combined immunodeficiency mutation using the polymerase chain reaction with confronting two-pair primers (PCR-CTPP). Exp Anim 2002, 51(4):391-393.

12. Tamakoshi A, Hamajima N, Kawase H, Wakai K, Katsuda N, Saito T, Ito H, Hirose K, Takezaki T, Tajima K: Duplex polymerase chain reaction with confronting two-pair primers (PCR-CTPP) for genotyping alcohol dehydrogenase beta subunit (ADH2) and aldehyde dehydrogenase 2 (ALDH2). Alcohol Alcohol 2003, 38(5):407-410.

13. Katsuda N, Hamajima N, Tamakoshi A, Wakai K, Matsuo K, Saito T, Tajima K, Tominaga S: Helicobacter pylori seropositivity and the myeloperoxidase G-463A polymorphism in combination with interleukin-1B C-31T in Japanese health checkup examinees. Jpn J Clin Oncol 2003, 33(4):192-197. 14. Togawa S, Joh T, Itoh M, Katsuda N, Ito H, Matsuo K, Tajima K, Hamajima N:

Interleukin-2 gene polymorphisms associated with increased risk of gastric atrophy from Helicobacter pylori infection. Helicobacter 2005, 10(3):172-178.

15. Abu-Amero KK, Al-Boudari OM, Mohamed GH, Dzimiri N: The Glu27 genotypes of the beta2-adrenergic receptor are predictors for severe coronary artery disease. BMC Med Genet 2006, 7:31.

16. Yang SJ, Wang HY, Li XQ, Du HZ, Zheng CJ, Chen HG, Mu XY, Yang CX: Genetic polymorphisms of ADH2 and ALDH2 association with esophageal cancer risk in southwest China. World J Gastroenterol 2007, 13(43):5760-5764.

17. Goldberg DE: Genetic algorithms in search, optimization, and machine learning. New York: Addison-Wesley 1989.

18. Jong KD: Learning with genetic algorithms: an overview. Mach Learning 1988, 3:121-138.

19. Holland JH: Adaptation in nature and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. MIT Press 1992.

20. Chang HW, Chuang LY, Ho CH, Chang PL, Yang CH: Odds ratio-based genetic algorithms for generating SNP barcodes of genotypes to predict disease susceptibility. OMICS 2008, 12(1):71-81.

21. Wu JS, Lee C, Wu CC, Shiue YL: Primer design using genetic algorithm. Bioinformatics 2004, 20(11):1710-1717.

Yanget al. BMC Bioinformatics 2010, 11:509 http://www.biomedcentral.com/1471-2105/11/509

(5)

22. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001, 29(1):308-311.

23. Sambrook J, Fritsch EF, Maniatis T: Molecular cloning. Cold Spring Harbor Laboratory Press Cold Spring Harbor, NY 1989.

24. De Jong KA, Spears WM: An analysis of the interacting roles of population size and crossover in genetic algorithms. Springer 1990, 1:38-47.

25. Sakurai T, Reichert J, Hoffman EJ, Cai G, Jones HB, Faham M, Buxbaum JD: A large-scale screen for coding variants in SERT/SLC6A4 in autism spectrum disorders. Autism Res 2008, 1(4):251-257.

26. Goldberg TE, Kotov R, Lee AT, Gregersen PK, Lencz T, Bromet E,

Malhotra AK: The serotonin transporter gene and disease modification in psychosis: Evidence for systematic differences in allelic directionality at the 5-HTTLPR locus. Schizophr Res 2009, 111(1-3):103-108.

27. Mandelli L, Mazza M, Martinotti G, Di Nicola M, Daniela T, Colombo E, Missaglia S, De Ronchi D, Colombo R, Janiri L, et al: Harm avoidance moderates the influence of serotonin transporter gene variants on treatment outcome in bipolar patients. J Affect Disord 2009, 119(1-3):205-209.

28. Yang CH, Cheng YH, Chuang LY, Chang HW: SNP-Flankplus: SNP ID-centric retrieval for SNP flanking sequences. Bioinformation 2008, 3(4):147-149. 29. Kampke T, Kieninger M, Mecklenburg M: Efficient primer design

algorithms. Bioinformatics 2001, 17(3):214-225.

30. Wu J, Wang J, Chen J: A practical algorithm for multiplex PCR primer set selection. Int J Bioinform Res Appl 2009, 5(1):38-49.

31. Wang J, Li KB, Sung WK: G-PRIMER: greedy algorithm for selecting minimal primer set. Bioinformatics 2004, 20(15):2473-2475.

32. Chen YF, Chen RC, Chan YK, Pan RH, Hseu YC, Lin E: Design of multiplex PCR primers using heuristic algorithm for sequential deletion applications. Comput Biol Chem 2009, 33(2):181-188.

33. Chen SH, Lin CY, Cho CS, Lo CZ, Hsiung CA: Primer Design Assistant (PDA): A web-based primer design tool. Nucleic Acids Res 2003, 31(13):3751-3754.

34. Yang CH, Cheng YH, Chuang LY, Chang HW: Genetic Algorithm for the Design of Confronting Two-Pair Primers. Ninth IEEE international Conference on BioInformatics and BioEngineering (BIBE) 2009, 242-247. 35. Hamajima N: PCR-CTPP: a new genotyping technique in the era of

genetic epidemiology. Expert Rev Mol Diagn 2001, 1(1):119-123.

doi:10.1186/1471-2105-11-509

Cite this article as: Yang et al.: Confronting two-pair primer design for enzyme-free SNP genotyping based on a genetic algorithm. BMC Bioinformatics 2010 11:509.

Submit your next manuscript to BioMed Central

and take full advantage of:

• Convenient online submission • Thorough peer review

• No space constraints or color figure charges • Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at