M E T H O D O L O G Y A R T I C L E
Open Access
Confronting two-pair primer design for
enzyme-free SNP genotyping based on a genetic
algorithm
Cheng-Hong Yang
1,2, Yu-Huei Cheng
1, Li-Yeh Chuang
3*, Hsueh-Wei Chang
4,5,6,7*Abstract
Background: Polymerase chain reaction with confronting two-pair primers (PCR-CTPP) method produces
allele-specific DNA bands of different lengths by adding four designed primers and it achieves the single nucleotide
polymorphism (SNP) genotyping by electrophoresis without further steps. It is a time- and cost-effective SNP
genotyping method that has the advantage of simplicity. However, computation of feasible CTPP primers is still
challenging.
Results: In this study, we propose a GA (genetic algorithm)-based method to design a feasible CTPP primer set to
perform a reliable PCR experiment. The SLC6A4 gene was tested with 288 SNPs for dry dock experiments which
indicated that the proposed algorithm provides CTPP primers satisfied most primer constraints. One SNP
rs12449783 in the SLC6A4 gene was taken as an example for the genotyping experiments using electrophoresis
which validated the GA-based design method as providing reliable CTPP primer sets for SNP genotyping.
Conclusions: The GA-based CTPP primer design method provides all forms of estimation for the common primer
constraints of PCR-CTPP. The GA-CTPP program is implemented in JAVA and a user-friendly input interface is freely
available at http://bio.kuas.edu.tw/ga-ctpp/.
Background
Genotyping is a common technique used in association
studies of diseases and cancers. Although many
high-throughput platforms of single nucleotide polymorphism
(SNP) genotyping, such as SNP array [1] and real-time
PCR using TaqMan probes [2], have been introduced,
most laboratories still validate SNP or novel mutation
by PCR-restriction fragment length polymorphism
(RFLP) genotyping [3-6] because this method is
inex-pensive for small-scale genotyping. One shortcoming of
PCR-RFLP is its long digestion time (usually in 2-3
hours) for restriction enzymes [7,8].
Recently, a restriction enzyme-free SNP genotyping
technique called
“PCR with confronting two-pair primers
(PCR-CTPP)” was developed [9-12]. It has been applied
successfully to at least 30 different SNP genotypings.
For example, interleukin-1B (IL-1B) C-31T, interleukin-2
(IL-2) -330G, beta2-adrenergic receptor (beta2-AR)
Gln27Glu, and aldehyde dehydrogenase 2 (ALDH2) were
genotyped by PCR-CTPP for association studies with
smoking behavior [13], pylori-induced gastric atrophy
[14], severe coronary artery disease [15], and esophageal
cancer risk [16], respectively. There is no doubt that the
PCR-CTPP method is suitable and reliable for most cases
of SNPs. This method considerably lowers the need to
consume restriction enzymes. However, the criteria for the
PCR-CTPP primers are only tolerant of a small difference
in melting temperature (T
m-diff) between the four primers
in the PCR-CTPP method [10]. Moreover, typical primer
design constraints also need to be considered, such as
pri-mer length, pripri-mer length difference, PCR product length,
GC proportion, melting temperature (T
m), GC clamp,
dimer (including cross-dimers and self-dimers), hairpin
structure, and specificity. Hence, the computational
requirements needed to improve the primer design with
PCR-CTTP are rather high.
* Correspondence: [email protected]; [email protected]
3
Department of Chemical Engineering & Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung, Taiwan
4
Department of Biomedical Science and Environmental Biology, Kaohsiung Medical University, Kaohsiung, Taiwan
Full list of author information is available at the end of the article
© 2010 Yang et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
To design CTPP primers with many corresponding
constraints, we introduce a genetic algorithm (GA)
[17,18] to improve the design of CTPP primer sets. GA
is a stochastic search algorithm modeled on the process
of natural selection underlying biological evolution [19].
It constitutes a randomized search and an optimization
technique that derives its working principle from natural
genetics. Since GA computation is similar in nature to
the evolution of the species, it best fits DNA behavior
associated with SNP interaction [20] and general primer
design [21]. The evolutionary computations involved,
such as selection, crossover and mutation, are effective
in achieving optimal solutions for many CTPP primer
sets. After each run, chromosomes in a GA share
infor-mation with each other and the superior solutions based
on a fitness rule are refined from generation to
genera-tion. Therefore, CTPP primers obeying the typical
pri-mer design constraints described above can be mined.
Methods
Problem formulation
The CTPP primer design problem can be described as
follows. Let T
Dbe a template DNA sequence, which is
composed of nucleotide codes with an identified SNP. T
Dis defined by:
T
B
i
i
B
D i i=
≤
≤ ∃
∈
{ |
,
!
is the index of DNA sequence,
1
IUPA
C
C code of SNP}
(1)
where B
iis the regular nucleotide code (A, T, C, or G)
mixed with a single IUPAC code of SNP (M, R, W, S, Y,
K, V, H, D, B or N) (is the existence and uniqueness).
For the target SNP, we focused only on true SNPs
described in dbSNP [22] of NCBI, i.e., deletion/insertion
polymorphisms (DIPs) and multi-nucleotide
polymorph-isms (MNPs) are not included.
The CTPP primer design requires two pairs of short
sequences which are constraining in T
Dbased on a
defined SNP site as illustrated (Figure 1). The forward
primer 1 (P
f1) is a short sense sequence in the upstream
(5
’ end) of a defined SNP site for some distances; the
reverse primer 1 (P
r1) is a short antisense sequence
which contains a nucleotide (the minor allele of the
defined SNP site) located at its 3’ end; the forward primer
2 (P
f2) is a short sense sequence which contains a
nucleo-tide (the major allele of the defined SNP site) located at
its 3’ end; and the reverse primer 2 (P
r2) is the antisense
sequence in the upstream of a defined SNP site for some
distances. These four primers are defined as follows:
P
r1=
{
B
i|
i
is the index of
T
D,
R
s1≤
i
≤
R
e1}
(2)
P
r1=
{
B
i|
i
is the index of
T
D,
R
s1≤
i
≤
R
e1}
(3)
P
f2=
{ |
B i
iis the index of
T
D,
F
s2≤
i
≤
F
e2}
(4)
P
r2=
{
B
i|
i
is the index of
T
D,
R
s2≤
i
≤
R
e2}
(5)
where both P
f1/P
r1and P
f2/P
r2are two primer pairs of
PCR-CTPP. F
s1vs. F
e1and R
s1vs. R
e1indicate the start
index vs. the end index of P
f1and P
r1in T
D, respectively.
F
s2vs. F
e2and R
s2vs. R
e2indicate the start index vs. the
end index of P
f2and P
r2in T
D, respectively. B
iis the
complementary nucleotide of B
i, which is described in
formula (1). For example, if B
i=
‘A’, then B
i=
‘T’; if B
i=
‘C’, then B
i=
’G’, and vice versa.
The target SNP site is defined at the end positions of
P
f2and P
r1, which are indicated by the symbols F
e2and
R
e1, respectively. As described in Figure 1, a vector (v)
with F
l1, P
l1, R
l1, F
l2, P
l2and R
l2is essential to design
the CTPP primer sets. This vector is defined as follows:
P
v=
(
F
l1,
P
l1,
R
l1,
F
l2,
P
l2,
R
l2)
(6)
F
l1, P
l1, R
l1, F
l2, P
l2and R
l2represent the lengths of
forward primer 1, PCR product between P
f1and P
r1,
reverse primer 1, forward primer 2, PCR product
between P
f2and P
r2and reverse primer 2, respectively.
Consequently, the forward and reverse primers can be
acquired from P
v, which is the prototype of a
chromo-some in GA and is used to perform evolutionary
com-putations as described in the following sections.
Definition of the fitness function
The regular primer design constraints are used as values
to design a fitness function to minimize the fitness
value. The fitness function is defined as follows:
Fitness P
Len
P
GC
P
GC
P
v diff v proportion v clamp v(
)
*(
(
)
(
)
(
))
=
+
+
+
3
1
1
5
m
0
0
*(
(
)
)
*(
(
)
dimer P
hairpin P
specificity P
T
P
v v v v+
( )
+
( )
+
+ TT
P
vg T
P
PCR
P
diff v diff v vm
1
m
6
len
(
))
*
_
(
)
*
+
+
( )
00
0
(7)
The weights (3, 10, 50, 60 and 100) of the fitness
function are applied to estimate the importance of the
primer constraints. These weights are set according to
the experiential conditions for PCR-CTPP. They also
accept adjustment based on the user experimental
requirements.
Primer length
The feasible primer length for a PCR experiment is set
between 16 and 28 bp. For longer primers, the T
mis
Yanget al. BMC Bioinformatics 2010, 11:509 http://www.biomedcentral.com/1471-2105/11/509
increased whereas the T
mof relatively short primers is
decreased. Accordingly, primers which are neither too
long nor too short are suitable. We have limited the
random values of F
l1, R
l1, F
l2and R
l2in an appropriate
range; therefore, the primer length estimation is not
considered to be joined to the fitness function.
A length difference (Len
diff) of less than or equal to 3 bp
between the F
l1/R
l1, F
l2/R
l2, and F
l1/R
l2primer sets is
con-sidered optimal. The primer length difference function is
defined as follows:
Len
P
defect
value
F
R
defec
diff v l l( )
_
=
=
−
≤
3
1 1if ABS (
)
3,
then
tt
value
F
R
defect
value
l l_
_
−
−
≤
−
1
1
2 2if ABS (
)
3,
then
if ABS (
)
3,
then
return
F
R
defect
value
defect
value
l1 l21
−
≤
−
⎧
⎨
⎪
_
_
⎪⎪
⎪
⎪
⎪
⎩
⎪
⎪
⎪
⎪
⎪
(8)
where Len
diff(P
v) has a maximal fitness value of 3; the
fitness value is decreased when the length difference
between a primer pair is less than or equal to 3 bp. ABS
represents the absolute value.
GC content and GC clamp
The function GC%(P) is proposed to represent the ratio
of G and C nucleotides appearing in a primer:
GC
P
G
P
C
P
P
number number%( )
( )
( )
|
|
=
+
(9)
where P represents a primer and |P| represents the
length of primer P; G
number(P) and C
number(P) represent
the numbers of the nucleotides G and C, respectively.
In general primer design, the typical GC proportion
constraint is set between 40% and 60%. However, the
designed CTPP primers contain the target SNP to limit
the range of the GC proportion. To relax this constraint,
the constraint of GC proportion in a primer is adjusted
to between 20% and 80%. Function GC
proportion(P
v) is
pro-posed with a maximal fitness value of 4 to lead the GC%
(P) of CTPP primers corresponding to this constraint:
GC
P
defect
value
GC
P
d
proportion v f( )
_
%(
)
,
=
=
≤
≤
4
80
1if 20
then eefect
value
GC
P
defect
value
_
%(
)
,
_
−
≤
≤
−
1
80
1
if 20
then
if 2
1 r0
0
then
if 20
2
≤
≤
−
≤
≤
GC
P
defect
value
GC
P
f%(
)
,
_
%(
)
,
280
1
80
rtthen
return
defect
value
defect
value
_
_
−
⎧
⎨
⎪
⎪
⎪
⎪
⎪
⎪⎪
⎩
⎪
⎪
⎪
⎪
⎪
⎪
⎪
1
(10)
To meet the presence of G or C nucleotides at the 3’
terminal of a primer to ensure a tight localized
hybridi-zation bond, the function GC
clamp(P
v) is proposed with
the maximal fitness value of 4 as follows:
GC P defect value P clamp v f ( ) _ = = 4 1 if 3’ end of is ’G’ or ’C’,, _ , then if 3’ end of 1 is ’G’ or ’C’ defect value P − 1 r then if 3’ end of is ’G’ or ’C’ t defect value Pf _ , − 1 2 h hen if 3’ end of is ’G’ or ’C’ th 2 defect value P _ , − 1 r een return defect value defect value _ _ − ⎧ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎪ ⎩ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1
(11)
Figure 1 Parameters of the template DNA and the CTPP primer set. Symbols indicate: F: Forward primer; R: Reverse primer; s: Start nucleotide position; e: End nucleotide position; P: Length of PCR product using a primer set (F/R); l: Length of primer or product;ι: Length of DNA template;δ1: Length from the Rs1end to downstream of template DNA;δ2: Length from Fs2to the downstream end of template DNA.
the PCR-CTPP is less developed for its computational
tool providing PCR-CTPP primer design. A novel
strat-egy for PCR-CTPP primer design has been introduced
in this paper and the freely available web server
imple-mented with this method was also constructed. With
experimental validation, our proposed GA-based
method is a useful algorithm to design feasible CTPP
primers and it conforms to most of the PCR-CTPP
constraints.
Availability and requirements
Project name: GA-CTPP: Confronting two-pair primer
design using genetic algorithm.
Project home page: http://bio.kuas.edu.tw/ga-ctpp/.
Operating system(s): Operating systems free with web
browser.
Programming language: Java.
Other requirements: Java 1.5.
License: none for academic users. For any restrictions
regarding the use by non-academics please contact the
corresponding author.
Additional material
Additional file 1:’The differences between our previous publication in BIBE 2009 conference [34] and this study’.
Additional file 2:’The performances for primer design using our proposed GA-CTPP algorithm between different population sizes of Dejong and Spears’s parameter settings’.
Acknowledgements
This work is partly supported by the National Science Council in Taiwan under grant NSC97-2622-E-151-008-CC2, 2221-E-214-050-MY3, NSC96-2311-B037-002, NSC96-2622-E214-004-CC3, NSC2221-E-151-040-, NSC 98-2622-E-151-001-CC2 and the funds KMU-EM-99-1.4 and DOH99-TD-C-111-002. We also thank for the technical support from Mr. Ming-Tz Tsai. Author details
1Department of Electronic Engineering, National Kaohsiung University of
Applied Sciences, Kaohsiung, Taiwan.2Department of Network Systems, Toko University, Chiayi, Taiwan.3Department of Chemical Engineering & Institute
of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung, Taiwan.4Department of Biomedical Science and Environmental Biology,
Kaohsiung Medical University, Kaohsiung, Taiwan.5Graduate Institute of Natural Products, College of Pharmacy, Kaohsiung Medical University, Kaohsiung, Taiwan.6Center of Excellence for Environmental Medicine,
Kaohsiung Medical University, Kaohsiung, Taiwan.7Cancer Center, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan.
Authors’ contributions
C-HY coordinated and oversaw this study. Y-HC participated in the design of the algorithm, and wrote the program and the manuscript. L-YC provided the biochemistry background and introduced the bioinformatics needed for primer design. H-WC performed and verified the PCR experiment, and modified the manuscript. All authors read and approved the final manuscript.
Received: 9 January 2010 Accepted: 13 October 2010 Published: 13 October 2010
References
1. Jasmine F, Ahsan H, Andrulis IL, John EM, Chang-Claude J, Kibriya MG: Whole-genome amplification enables accurate genotyping for microarray-based high-density single nucleotide polymorphism array. Cancer Epidemiol Biomarkers Prev 2008, 17(12):3499-3508.
2. Hui L, DelMonte T, Ranade K: Genotyping using the TaqMan assay. Curr Protoc Hum Genet 2008, Chapter 2(Unit 2):10.
3. Chang HW, Yang CH, Chang PL, Cheng YH, Chuang LY: SNP-RFLPing: restriction enzyme mining for SNPs in genomes. BMC Genomics 2006, 7:30.
4. Lin GT, Tseng HF, Yang CH, Hou MF, Chuang LY, Tai HT, Tai MH, Cheng YH, Wen CH, Liu CS, et al: Combinational polymorphisms of seven CXCL12-related genes are protective against breast cancer in Taiwan. OMICS 2009, 13(2):165-172.
5. Yen CY, Liu SY, Chen CH, Tseng HF, Chuang LY, Yang CH, Lin YC, Wen CH, Chiang WF, Ho CH, et al: Combinational polymorphisms of four DNA repair genes XRCC1, XRCC2, XRCC3, and XRCC4 and their association with oral cancer in Taiwan. J Oral Pathol Med 2008, 37(5):271-277. 6. Aomori T, Yamamoto K, Oguchi-Katayama A, Kawai Y, Ishidao T, Mitani Y,
Kogo Y, Lezhava A, Fujita Y, Obayashi K, et al: Rapid single-nucleotide polymorphism detection of cytochrome P450 (CYP2C9) and vitamin K epoxide reductase (VKORC1) genes for the warfarin dose adjustment by the SMart-amplification process version 2. Clin Chem 2009, 55(4):804-812. 7. Chuang LY, Yang CH, Tsui KH, Cheng YH, Chang PL, Wen CH, Chang HW:
Restriction enzyme mining for SNPs in genomes. Anticancer Res 2008, 28(4A):2001-2007.
8. NCBI: Restriction Fragment Length Polymorphism. [http://www.ncbi.nlm. nih.gov/genome/probe/doc/TechRFLP.shtml], (RFLP) (accessed September 2009).
9. Hamajima N, Saito T, Matsuo K, Kozaki K, Takahashi T, Tajima K: Polymerase chain reaction with confronting two-pair primers for polymorphism genotyping. Jpn J Cancer Res 2000, 91(9):865-868.
10. Hamajima N, Saito T, Matsuo K, Tajima K: Competitive amplification and unspecific amplification in polymerase chain reaction with confronting two-pair primers. J Mol Diagn 2002, 4(2):103-107.
11. Maruyama C, Suemizu H, Tamamushi S, Kimoto S, Tamaoki N, Ohnishi Y: Genotyping the mouse severe combined immunodeficiency mutation using the polymerase chain reaction with confronting two-pair primers (PCR-CTPP). Exp Anim 2002, 51(4):391-393.
12. Tamakoshi A, Hamajima N, Kawase H, Wakai K, Katsuda N, Saito T, Ito H, Hirose K, Takezaki T, Tajima K: Duplex polymerase chain reaction with confronting two-pair primers (PCR-CTPP) for genotyping alcohol dehydrogenase beta subunit (ADH2) and aldehyde dehydrogenase 2 (ALDH2). Alcohol Alcohol 2003, 38(5):407-410.
13. Katsuda N, Hamajima N, Tamakoshi A, Wakai K, Matsuo K, Saito T, Tajima K, Tominaga S: Helicobacter pylori seropositivity and the myeloperoxidase G-463A polymorphism in combination with interleukin-1B C-31T in Japanese health checkup examinees. Jpn J Clin Oncol 2003, 33(4):192-197. 14. Togawa S, Joh T, Itoh M, Katsuda N, Ito H, Matsuo K, Tajima K, Hamajima N:
Interleukin-2 gene polymorphisms associated with increased risk of gastric atrophy from Helicobacter pylori infection. Helicobacter 2005, 10(3):172-178.
15. Abu-Amero KK, Al-Boudari OM, Mohamed GH, Dzimiri N: The Glu27 genotypes of the beta2-adrenergic receptor are predictors for severe coronary artery disease. BMC Med Genet 2006, 7:31.
16. Yang SJ, Wang HY, Li XQ, Du HZ, Zheng CJ, Chen HG, Mu XY, Yang CX: Genetic polymorphisms of ADH2 and ALDH2 association with esophageal cancer risk in southwest China. World J Gastroenterol 2007, 13(43):5760-5764.
17. Goldberg DE: Genetic algorithms in search, optimization, and machine learning. New York: Addison-Wesley 1989.
18. Jong KD: Learning with genetic algorithms: an overview. Mach Learning 1988, 3:121-138.
19. Holland JH: Adaptation in nature and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. MIT Press 1992.
20. Chang HW, Chuang LY, Ho CH, Chang PL, Yang CH: Odds ratio-based genetic algorithms for generating SNP barcodes of genotypes to predict disease susceptibility. OMICS 2008, 12(1):71-81.
21. Wu JS, Lee C, Wu CC, Shiue YL: Primer design using genetic algorithm. Bioinformatics 2004, 20(11):1710-1717.
Yanget al. BMC Bioinformatics 2010, 11:509 http://www.biomedcentral.com/1471-2105/11/509
22. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001, 29(1):308-311.
23. Sambrook J, Fritsch EF, Maniatis T: Molecular cloning. Cold Spring Harbor Laboratory Press Cold Spring Harbor, NY 1989.
24. De Jong KA, Spears WM: An analysis of the interacting roles of population size and crossover in genetic algorithms. Springer 1990, 1:38-47.
25. Sakurai T, Reichert J, Hoffman EJ, Cai G, Jones HB, Faham M, Buxbaum JD: A large-scale screen for coding variants in SERT/SLC6A4 in autism spectrum disorders. Autism Res 2008, 1(4):251-257.
26. Goldberg TE, Kotov R, Lee AT, Gregersen PK, Lencz T, Bromet E,
Malhotra AK: The serotonin transporter gene and disease modification in psychosis: Evidence for systematic differences in allelic directionality at the 5-HTTLPR locus. Schizophr Res 2009, 111(1-3):103-108.
27. Mandelli L, Mazza M, Martinotti G, Di Nicola M, Daniela T, Colombo E, Missaglia S, De Ronchi D, Colombo R, Janiri L, et al: Harm avoidance moderates the influence of serotonin transporter gene variants on treatment outcome in bipolar patients. J Affect Disord 2009, 119(1-3):205-209.
28. Yang CH, Cheng YH, Chuang LY, Chang HW: SNP-Flankplus: SNP ID-centric retrieval for SNP flanking sequences. Bioinformation 2008, 3(4):147-149. 29. Kampke T, Kieninger M, Mecklenburg M: Efficient primer design
algorithms. Bioinformatics 2001, 17(3):214-225.
30. Wu J, Wang J, Chen J: A practical algorithm for multiplex PCR primer set selection. Int J Bioinform Res Appl 2009, 5(1):38-49.
31. Wang J, Li KB, Sung WK: G-PRIMER: greedy algorithm for selecting minimal primer set. Bioinformatics 2004, 20(15):2473-2475.
32. Chen YF, Chen RC, Chan YK, Pan RH, Hseu YC, Lin E: Design of multiplex PCR primers using heuristic algorithm for sequential deletion applications. Comput Biol Chem 2009, 33(2):181-188.
33. Chen SH, Lin CY, Cho CS, Lo CZ, Hsiung CA: Primer Design Assistant (PDA): A web-based primer design tool. Nucleic Acids Res 2003, 31(13):3751-3754.
34. Yang CH, Cheng YH, Chuang LY, Chang HW: Genetic Algorithm for the Design of Confronting Two-Pair Primers. Ninth IEEE international Conference on BioInformatics and BioEngineering (BIBE) 2009, 242-247. 35. Hamajima N: PCR-CTPP: a new genotyping technique in the era of
genetic epidemiology. Expert Rev Mol Diagn 2001, 1(1):119-123.
doi:10.1186/1471-2105-11-509
Cite this article as: Yang et al.: Confronting two-pair primer design for enzyme-free SNP genotyping based on a genetic algorithm. BMC Bioinformatics 2010 11:509.
Submit your next manuscript to BioMed Central
and take full advantage of:
• Convenient online submission • Thorough peer review
• No space constraints or color figure charges • Immediate publication on acceptance
• Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at