[28] Thomas, J.W., Touchman, J.W., Blakesley, R.W., Bouffard, G.G., Beckstrom-Sternberg, S.M., Margulies, E.H., Blanchette, M., Siepel, A.C., Thomas, P.J., McDowell, J.C., Maskeri, B., Hansen, N.F., Schwartz, M.S., Weber, R.J., Kent, W.J., Karolchik, D., Bruen, T.C., Bevan, R., Cutler, D.J., Schwartz, S., Elnitski, L., Idol, J.R., Prasad, A.B., Lee-Lin, S.Q., Maduro, V.V., Summers, T.J., Portnoy, M.E., Dietrich, N.L., Akhter, N., Ayele, K., Benjamin, B., Cariaga, K., Brinkley, C.P., Brooks, S.Y., Granite, S., Guan, X., Gupta, J., Haghighi, P., Ho, S.L., Huang, M.C., Karlins, E., Laric, P.L., Legaspi, R., Lim, M.J., Maduro, Q.L., Masiello, C.A., Mastrian, S.D., McCloskey, J.C., Pearson, R., Stantripop, S., Tiongson, E.E., Tran, J.T., Tsurgeon, C., Vogt, J.L., Walker, M.A., Wetherby, K.D., Wiggins, L.S., Young, A.C., Zhang, L.H., Osoegawa, K., Zhu, B., Zhao, B., Shu, C.L., De, Jong, P.J., Lawrence, C.E., Smit, A.F., Chakravarti, A., Haussler, D., Green, P., Miller, W. and Green, E.D. (2003) Comparative analyses of multi-species sequences from targeted genomic regions. Nature, 424: 788-793.
[29] Lecompte, O., Thompson, J.D., Plewniak, F., Thierry, J. and Poch, O. (2001) Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene, 270: 17-30.
[30] Batzoglou, S., Pachter, L., Mesirov, J. P., Berger, B. and Lander, E. S. (2000) Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res., 10: 950-958.
[31] Chen, R., Bouck, J. B., Weinstock, G. M. and Gibbs, R. A. (2001) Comparing vertebrate whole-genome shotgun reads to the human genome. Genome Res., 11: 1807-1816.
[32] Mouse Genome Sequencing Consortium. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature, 420: 520-562.
[33] Dubchak, I., Brudno, M., Loots, G.G., Pachter, L., Mayor, C., Rubin, E.M. and Frazer, K.A. (2000) Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res., 10: 1304-1306.
[34] Hardison, R.C. (2000) Conserved noncoding sequences are reliable guides to regulatory elements. Trends. Genet., 16: 369-372.
[35] Pennacchio, L.A. and Rubin, E.M. (2001) Genomic strategies to identify mammalian regulatory sequences. Nature Rev. Genet., 2: 100-109.
[36] International Human Genome Sequencing Consortium. (2001) Initial sequencing and analysis of the human genome. Nature, 409: 860-921.
[37] Oliphant, A.R., Brandl, C.J. and Struhl, K. (1989) Defining the sequence specificity of DNA-binding proteins by selecting binding sites from random-sequence oligonucleotides:
analysis of yeast GCN4 protein. Mol. Cell Biol., 9: 2944-9.
[38] Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel,A. E., Kel-Margoulis, O.V., Kloos, D.U., Land, S., Lewicki-Potapov, B., Michael, H., Munch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S. and Wingender, E. (2003) TRANSFAC: transcriptional regulation, from patterns to profiles.
Nucleic. Acids. Res., 31: 374–378.
[39] Ghosh, D. (1993). Status of the transcription factors database (TFD). Nucleic. Acids. Res., 24: 238-241.
[40] Chen, Q.K., Hertz, J.Z. and Stormo, G.D. (1995) MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices. Comp. Appl. Biosciences, 11: 563-566.
[41] Prestridge, D.S. (1991) SIGNAL SCAN: A computer program that scans DNA sequences for eukaryotic transcriptional elements. Comput. Appl. Biosci., 7: 203-206.
[42] Nakabayashi, H., Koyama, Y., Sakai, M., Li, H.M., Wong, N.C. and Nishi, S. (2001) Glucocorticoid stimulates primate but inhibits rodent alpha-fetoprotein gene promoter.
Biochem. Biophys. Res. Commun., 287: 160-172.
[43] Winter, H., Langbein, L., Krawczak, M., Cooper, D.N., Jave-Suarez, L F., Rogers, M.A., Praetzel, S., Heidt, P.J. and Schweizer, J. (2001) Human type I hair keratin pseudogene phi-hHaA has functional orthologs in the chimpanzee and gorilla: evidence for recent inactivation of the human gene after the Pan-Homo divergence. Hum. Genet., 108:
37-42.
[44] Koppe, l.D.A., Wolfe, S.A., Fogelfeld, L.A., Merchant, P,S., Prouty, L. and Grimes, S.R.
(1994) Primate testicular histone H1t genes are highly conserved and the human H1t gene is located on chromosome 6. J. Cell Biochem., 54: 219-230.
[45] Vallejo, A.N. and Pease, L.R. (1995) Structure of the MHC A and B locus promoters in hominoids. Insights on the evolution of the class I MHC multigene family. J. Immunol., 154: 3912-3921.
[46] Clarimon, J., Andres, A.M., Bertranpetit, J. and Comas, D. (2004) Comparative analysis of Alu insertion sequences in the APP 5' flanking region in humans and other primates. J.
Mol. Evol., 58: 722-31.
[47] Mummidi, S., Bamshad, M., Ahuja, S.S., Gonzalez, E., Feuillet, P.M. Begum, K., Galvis, M.C., Kostecki, V., Valente, A.J., Murthy, K.K., Haro, L., Dolan, M.J., Allan, J.S. and Ahuja, S.K. (2000) Evolution of human and non-human primate CC chemokine receptor 5 gene and mRNA. Potential roles for haplotype and mRNA diversity, differential haplotype-specific transcriptional activity, and altered transcription factor binding to polymorphic nucleotides in the pathogenesis of HIV-1 and simian immunodeficiency virus. J. Biol. Chem., 275:18946-61.
[48] Yoshimura, K., Nakamura, H., Trapnell, B.C., Dalemans, W., Pavirani, A., Lecocq, J.P.
and Crystal, R.G. (1991) The cystic fibrosis gene has a 'housekeeping'-type promoter and is expressed at low levels in cells of epithelial origin. J. Biol. Chem., 266: 9140-9144.
[49] Reitsma, P.H., Bertina, R.M., Ploos van Amstel, J.K., Riemens,A. and Briet, E. (1988) The putative factor IX gene promoter in hemophilia B Leyden. Blood, 72: 1074-1076.
[50] Clarimon, J., Andres, A.M., Bertranpetit, J. and Comas, D. (1996) Isolation and characterization of the human mismatch repair gene hMSH2 promoter region. Hum.
Genet., 97: 114-116.
[51] Sugawara, T., Lin, D., Holt, J.A., Martin, K.O., Javitt, N.B., Miller, W.L. and Strauss, J.F.
III (1995) Structure of the human steroidogenic acute regulatory (StAR) gene: StAR stimulates mitochondrial cholesterol 27-hydroxylase activity. Biochemistry, 34:
12506-12512.
[52] Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic. Acids. Res., 22:
4673-4680.
[53] Felsenstein, J. (1989) PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5:
164-166.
[54] Horiike, T., Hamada, K., Kanaya, S. and Shinozawa, T. (2001). Origin of eukaryotic nuclei by symbiosis of Archaea in Bacteria is revealed by homology-hit analysis. Nat.
Cell Biol. 3: 210-214.
[55] Pai, T.-W., Chang, W.-Y., Chang, M.D.-T., Chu, J.-H. and Tai, H.L. (2004) Ladderlike Stepping and Interval Jumping Searching Algorithm for DNA Sequences. In Proc.
Second Asia-Pacific Bioinformatics Conference (APBC2004), 29: 93-98.
[56] Crooks, G.E., Hon, G., Chandonia, J.M. and Brenner, S.E. (2004) WebLogo: A sequence logo generator. Genome Res., 14: 1188-1190.
[57] Schneider, T.D. and Stephens, R.M. (1990) Sequence Logos: A New Way to Display Consensus Sequences. Nucleic. Acids. Res., 18: 6097-6100.
[58] Quandt, K., Frech, K., Karas, H., Wingender, E. and Werner, T. (1995) MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic. Acids. Res. 23: 4878-4884.
[59] Chen, Q.K. Hertz, G.Z. and Stormo, G.D. (1995) Matrix search 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices. Comput. Appl. Biosci., 11: 563-566.
[60] Felsenstein, J. (1989) PHYLIP -- Phylogeny Inference Package (Version 3.2). Cladistics, 5: 164-166.
[61] Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W. and Lenhard, B. (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles.
Nucleic. Acids. Res., 32: D91-4.
[62] Ludwig, M.Z., Bergman, C., Patel, N.H. and Kreitman, M. (2000) Evidence for stabilizing selection in a eukaryotic enhancer element. Nature, 403, 564-567.
[63] Grabe, N. (2002) AliBaba2: Context Specific Identification of Transcription Factor Binding Sites. In Silico. Biol., 2: S1-1.
[64] Schug J. and Overton, G.C. (1997) TESS: Transcription Element Search Software on the WWW in Technical Report CBIL-TR-1997-1001-v0.0, of the Computational Biology and Informatics Laboratory, School of Medicine, University of Pennsylvania.
[65] Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic. Acids. Res., 25, 3389-3402.
Table 2.1 Prediction programs of transcription elements binding sites
Program Operating principle Technical data URL Reference
AliBaba2 It predicts of transcription factor binding sites by constructing matrices on the fly from TRANSFAC 4.0 public.
The construct matrices are from all binding sites of TRANSFAC database [38].
http://www.alibaba2.
com/
63
Match 1.0 (Matrix search)
The MonoMatch Profilers and DiMatch Profilers provide mean for creating (editing, deleting) matrix profiles - specific subsets of weight matrixes with defined cut-offs.
Both profilers are the library of mono- and di-nucleotide weight matrixes from
TRANSFAC 3.5[38].
http://compel.bionet.
nsc.ru/Match/Match.
html
40
MatInspector It utilizes a library of matrix descriptions for transcription factor binding sites to locate matches in sequences of unlimited length.
The matrix family library contains 592 weight matrices in six groups of TRANSFAC database [38].
http://www.genomati x.de/products/MatIns pector/
58
Signal Scan (Signal
Sequence Scan)
The signal database source is used to search transcription factor binding sites.
The signal database source is derived from TFDa, TRANSFAC[38] and IMDb Matrix databases.
http://bimas.dcrt.nih.
gov/molbio/signal/
41
TESS
(Transcription Element Search System)
A web tool for identifying binding sites using site, consensus strings, and
positional weight matrices from databases.
Data source from the TRANSFAC[38], IMDb, and CBIL-GibbsMatc database.
http://www.cbil.upen n.edu/tess
64
TFBLAST A tool for predicting in the TRANSFAC Factor Table using BLAST (BLASTX and BLASTP) algorithm.
Data source from the TRANSFAC database[35].
http://www.gene-regu lation.com/pub/progr ams.html
65
a Transcription Factor Database (TFD) maintained by David Ghosh [39] is available at GCG and NIH.
b Information Matrix Database (IMD) maintained by Dr. Qing Chen [40] is available from http://bimas.dcrt.nih.gov/molbio/signal/
c CBIL-GibbsMat database contains weight matrices built from TRANSFAC database and is available from http://www.cbil.upenn.edu/tess.
Table 3.1 Summary of the primate, human and rodent promoter sequences
Gene name Primate Human Rodents Accession number
5-HTT Hylobates muelleri(5)
AB061809 AB061810 AB061811 AB061812 AB061813
Gorilla gorilla(3)
AB061803 AB061804 AB061805
Pan troglodytes AB061802
Pongo pygmaeus(3)
AB061806 AB061807 AB061808
○
AB061799 AB061800 AB061801
11 beta hydroxylase Papio ursinus U52081
AFP Gorilla gorilla AB053570
Pan troglodytes AB053571
Pongo pygmaeus AB053569
○
AB053572 L34019 Z19532
○a AB053573
○b AB053574
Apolipoprotein (a) Pan troglodytes AY028467
APP Erythrocebus patas AY242978
Gorilla gorilla AY242976
Macaca mulatta AF067971
Mandrillus leucophaeus AY242979
Pan troglodytes AY242975
Pongo pygmaeus AY242977
○ AY242974
BMP2 Cercopithecus aethiops(2) AY494189
AY494190
Saimiri sciureus AY494188
○ NM_001200
○a AF074942
Brain-2 / N-Oct 3 Gorilla gorilla AB037479
Pan troglodytes(2) AB037477
AB037478
Pongo pygmaeus AB037480
C-III Macaca fascicularis X77900
CCR2 Macaca mulatta AY083266
Macaca radiata AY083265
○ AF068265
CCR5 Ateles geoffroyi AF252595
Aotus trivirgatus AF252594
Callithrix jacchus AF252597
Cercocebus torquatus atys AF115964
Colobus guereza AF252570
AF252569
Erythrocebus patas AF252573
Gorilla gorilla AF252560
Macaca fascicularis AF252565
Macaca mulatto(2) AF252567
AF252568
Macaca nemestrina(2) AF115963
AF252566
Pan troglodytes(5)
AF109384 AF252556 AF252557 AF252558 AF252559
Papio anubis(3)
AF252562 AF252563 AF252564
Pongo pygmaeus AF252561
○ AF246924
AF032132
CFTR Hylobates lai X95930
Macaca fascicularis X95929
Saimiri sciureus X95928
○ M58478
Cytochrome p450
2E1 Erythrocebus patas U82752
Macaca fascicularis AF053081
○ D10014
DARC Macaca fascicularis AF320031
Macaca mulatta AF320032
Differentiation- dependent A4 protein
Gorilla gorilla AB041368
Pan troglodytes AB041367
Pongo pygmaeus AB041369
DPYD Macaca fascicularis AF216268
Macaca mulatta AF216267
DQB1 Pan troglodytes(8)
AF098646 AF098647 AF098648 AF098649 AF098650 AF098651 AF098652 AF098653
DQB2 Pan troglodytes(5)
AF098654 AF098655 AF098656 AF098657 AF098658
○ AF098660
DRA Aotus trivirgatus X5962
Gorilla gorilla X59619
Macaca fascicularis X59621
Pan troglodytes X57759
Papio hamadryas X59620
Epsilon-globin Pongo pygmaeus X05035
Factor IX Macaca fascicularis(2) X54634
X65473
Pan troglodytes X65472
○ X55008
Fmr1 Macaca arctoides AF251350
Pan troglodytes AF251349
Gogo-A Gorilla gorilla(6)
L32849 L32850 L38648 L38649 L38650 L38651
Gogo-B Gorilla gorilla(4)
L32847 L32848 L38647 L38652
GPHA Aotus trivirgatus AF401995
Callicebus moloch AF401996
Colobus guereza AF401993
Pongo pygmaeus AF401992
Presbytis obscura AF401994
Tarsius bancanus AF401997
HaA Gorilla gorilla AJ289713
Pan troglodytes AJ289712
Histone H1t Macaca mulatta M97756
Huntington's
Disease Gorilla gorilla Y07988
Pan troglodytes(2) Y07989
Y07990
IFN-gamma Callithrix jacchus X64659
Cercocebus torquatus atys AY486429
Macaca mulatta AY486428
IGFBP-1 Papio anubis AY095345
IL-4 Cercocebus torquatus atys AY486435
Macaca mulatto(2) AY083267
AY486434
Macaca radiata AY083268
IL-10 Cercocebus torquatus atys AY486432
Macaca mulatta AY486436
IL-12 p40 Macaca mulatta AY486436
INMT Gorilla gorilla AB041364
Pan troglodytes AB041363
Pongo pygmaeus AB041365
LCT Pan troglodytes AF282888
LPA Cercopithecus aethiops AY192774
Colobus guereza
kikuyuensis AY192776
Erythrocebus patas AY192775
Gorilla gorilla(2) AY192784
AY192785
Macaca mulatta AY192772
Macaca nemestrina AY192771
Pan troglodytes AY192769
Pan paniscus AY192773
Papio hamadryas AY192770
MAO A Gorilla gorilla AB042831
Macaca mulatta AJ544234
Pan troglodytes AB042830
Pongo pygmaeus AB042832
MCP1 Callithrix jacchus AF493701
Macaca radiata AF493700
Papio hamadryas AF493699
○ AF493697
AF493698
MC1R Gorilla gorilla AF387968
Pan troglodytes AF387969
MID1 Callithrix jacchus(2) AY112908
AY112912
MSH2 Callithrix penicillata AJ002053
Cercopithecus patas AJ002049
Gorilla gorilla AJ002050
Pan troglodytes AJ002051
Pongo pygmaeus AJ002052
○ U23824
Nerve growth
factor Gorilla gorilla AB037485
Pan troglodytes(2) AB037483
AB037484
Pongo pygmaeus AB037486
Neurofilament M Gorilla gorilla AB042835
Pan troglodytes AB042834
Pongo pygmaeus AB042836
○ AB042833
Paan-AG Papio anubis anubis(4) AY434097
AY434102 AY434094 AY434096
Patr-A Pan troglodytes L32846
L32856 L32857
Patr-B Pan troglodytes(2) L32845
L32855
PON1 Pongo pygmaeus AB089303
Pan troglodytes AB089302
Popy-A Pongo pygmaeus(4) L32843
L32844 L32853 L32854
Popy-B Pongo pygmaeus(2) L32842
L32858
protein C Callithrix jacchus U77649
Cebus apella U77652
Gorilla gorilla U77648
Macaca fascicularis U77654
Macaca mulatta U77651
Pan troglodytes U77647
Papio hamadryas U77646
Pongo pygmaeus U77650
○b U77653
PSP94 Papio hamadryas U64888
RB1 Pan troglodytes AF336015
RHAG Gorilla gorilla AF177628
Hylobates sp. AF177630
Macaca mulatta AF177631
Pan troglodytes AF177627
Papio hamadryas AF177632
Pongo pygmaeus AF177629
○ AF178844
○a AB036994
SLC6A4 Macaca mulatta AF191557
StAR Macaca mulatta AY007224
○ U29098
TNF Aotus trivirgatus AF195668
Gorilla gorilla AF195664
Hylobates lai AF195666
Macaca mulatta AF195667
Pan troglodytes AF195663
Pongo pygmaeus AF195665
Saimiri sciureus AY208942
TNFA Cercocebus torquatus atys AY486431
Colobus guereza U42765
Gorilla gorilla U42763
Macaca mulatta AY486430
Pan troglodytes U42626
Papio ursinus AF027198
Pongo pygmaeus U42764
○ L11698
U42625
UGT1A1 Colobus guereza AF135469
Gorilla gorilla(2) AF135465
AF135464
Pan paniscus AF135462
Pan troglodytes AF135463
Papio cynocephalus AF135468
Pongo pygmaeus AF135466
○ AF180372
VHL Gorilla gorilla AF291825
Macaca fascicularis AF291827
Pan troglodytes AF291824
Papio anubis AF291826
○ AF010238
XRCC1 Papio hamadryas AF019114
○a: Mus musculus
○b: Rattus norvegicus
Multiple sequence alignments Obtain human and
rodent promoter sequences
Screen TRANSFAC database Retrieve primate
promoter sequences from NCBI
Perform PCMC program
Create sequence logos Develop a
visualization tool
Detect putative promoter regulatory elements
Homology-hit analysis using BLASTn
Figure 3.1 The flowchart of this study
Figure 3.2 The procedure of visualization tool for transcription elements binding site
The overall execution procedure is delivered as following:
1. Choose the input channel 2 for input data.
2. Manipulate the segmental sequence input.
3. Read through the selected file (neglect the comment lines starting with “>” or space lines; filter the others for the strings ATCG).
4. Check whether the segmental sequence input appears in the selected file.
5. Generate the results in graphical format.
The output format is shown as the following that the starting coordinates of each segmental sequence marked on the input sequences.
Figure 3.3 The output format
Table 4.1 The divergence of substitution of sequence in the nonhuman primate promoter genes
Genetic distances (transitions + transversions) Gene name Kimura two-parameter p-distance Same gene same species DQB1 d = 0.052, SD* = 0.007 d = 0.049, SD = 0.006
UGT1A d = 0.060, SD = 0.017 d = 0.056, SD = 0.015 TNF d = 0.067, SD = 0.006 d = 0.062, SD = 0.006 Protein C d = 0.062, SD = 0.007 d = 0.058, SD = 0.007 LPA d = 0.057, SD = 0.004 d = 0.054, SD = 0.004 5-HTT d = 0.071, SD = 0.011 d = 0.066, SD = 0.010 Same gene different species
CCR5 d = 0.051, SD = 0.005 d = 0.048, SD = 0.005
Mean distance d =0.061, SD =0.008 d = 0.057, SD = 0.007
* SD: standard derivation
Table 4.2 Summary of the data of putative binding sites identified from the TRANSFAC database
The putative binding sites of genes identified from the TRANSFAC database were counted in the both regions. Genes with only one sequence published were excluded.
Conserved region Less-conserved region Gene name
binding site binding factor binding site binding factor
Sequence number
11 beta hydroxylase (20,18)* 1
5-HTT 5 3 73 6 12
AFP 29 23 0 0 3
Apolipoprotein (a) (84, 0) 1
APP 8 4 357 69 6
BMP2 0 0 150 26 3
Brain-2 / N-Oct 3 50 16 16 6 4
C-III (95, 29) 1
CCR2 13 10 1 1 2
CCR5 0 0 20 14 22
CFTR 57 28 132 34 3
Cytochrome p450
2E1 homolog 9 5 45 24 2
DARC 11 5 0 0 2
Differentiation-
dependent A4 protein 40 23 2 2 3
DPYD 86 32 0 0 2
DQB1 30 19 20 14 8
DQB2 31 17 6 6 5
DRA 46 29 6 6 5
Epsilon-globin (229, 42) 1
Factor IX 8 7 3 3 3
Fmr1 24 10 8 6 2
Gogo-A 11 11 26 13 6
Gogo-B 28 20 5 4 4
GPHA 0 0 31 22 6
HaA 42 21 11 8 2
Histone H1t (41, 12) 1
Huntington's Disease 23 14 0 0 3
IFN-gamma 32 19 107 32 3
IGFBP-1 (242, 44) 1
IL-10 75 28 1 1 2
IL-12 p40 (34, 15) 1
IL-4 24 17 35 13 4
INMT 9 7 3 3 3
LCT (43, 19) 1
LPA 19 9 128 41 10
MAO A 43 16 57 19 4
MCP1 9 3 17 12 3
MC1R 368 50 66 22 2
MID1 0 0 16 14 2
MSH2 22 10 37 14 5
Nerve growth factor 23 12 14 6 4
Neurofilament M 19 11 13 4 3
Paan-AG 15 8 98 37 4
Patr-A 23 18 4 3 3
Patr-B 29 21 1 1 2
PON1 72 28 6 4 2
Popy-A 19 16 13 10 4
Popy-B 29 21 2 2 2
Protein C 12 7 24 13 8
PSP94 (60, 28) 1
RB1 (161, 36) 1
RHAG 21 8 47 22 6
SLC6A4 74(9) 1
Steroidogenic acute
regulatory protein (82, 21) 1
TNF 14 8 61 21 7
TNFA 0 0 144 35 7
UGT1A1 2 2 17 13 7
VHL 8 6 31 19 4
XRCC1 (182, 33) 1
Total 1439 82 1854 89 222
*: (binding site numbers, binding factor numbers)
(a)
Gene Name
IGFBP -1 LCT XRCC1 epsilon-globin P SP 94 DARC P at r-B MID1 DP YD BMP 2 IFN-gamma neurofilament M fact or IX Gogo-B VHL DQB2 RHAG Gogo-A T NF LP A
Number
25 20
15 10
5 0
(b)
Species T arsius bancanus Mandrillus leucophae At eles geoffroyi P resbyt is obscura Macaca arct oides Cebus apella Hylobat es lai Colobus guereza kiku Macaca radiat a Saimiri sciureus Colobus guereza Aot us t rivirgat us P apio anubis Callit hrix jacchus Macaca fascicularis P ongo pygmaeus P an t roglodyt es
Number
55 50 45 40 35 30 25 20 15 10 5 0
Figure 4.1 (a) The distribution of numbers of species in genes available for nonhuman primates. (b) The distribution of numbers of genes available for nonhuman primates
AY192785-Gorill AY192784-Gorill AY242976 Gorill
AB053570 Gorill AJ289713 GGO289 AF135465 Gorill AF135464 Gorill AF387968 AF3879 AF291825-Gorill
AF252560 Gorill X59619 GGDRAP G
AF177628 Gorill AB042831- Goril
U42763-Gorilla AF195664-Gorill
The trees were built based to neighbor-joining algorithm and bootstrap on MEGA version 2.1. Evolution distances were estimated by a p-distance matrix.
Figure 4.2 Phylogenic trees of Gorilla
AB061804 Gorill AB061803 Gorill AB061805 Gorill AB041364-Gorill
U77648 Gorilla AB037485-Gorill AB037479 Gorill
AB042835-Gorill AB041368 Gorill
AJ002050-Gorill Y07988 Gorilla
L38651 GORMHIE L38652 GORMHIF L38647 GORMHIA
L32847 GORMHCIA L32848 GORMHCIA L38649 GORMHIC L32849 GORMHCIA L38648 GORMHIB L38650 GORMHID L32850 GORMHCIA
100
72 71 99
87 95 59 100 48
23 14 3 26
4
7 0
100
100 100 99
28 14
21 12 11 1
6 1
0
0
0.1
AY192785_LPA AY192784_LPA AY242976_APP
B053570_AFP AJ289713_HaA AF135465_UGT1A1 AF135464_UGT1A1 AF387967_MC1P AF291825_VHL
AF252560_CCR5 X59619_DRA
AF177628_RHAG AB042831_MAO A
U42763_TNFA AF195664_TNF
AB061804_5-HTT AB061803_5-HTT AB061805_5-HTT AB041364_INMT
U77648_Prtein C
AB037485_Nerve growth factor
AJ002050_MSH2
AB037479_Brain-2/N-Oct-3 AB042835_Neurofilament M
AB041368_ Differentiation-dependent A4 protein
Y07988_Hunting’s disease L38651_Gogo-A
L38647_Gogo-B L38652_Gogo-B
L32847_Gogo-B L32848_Gogo-B L38649_Gogo-A L32849_Gogo-A L38648_Gogo-A L38650_Gogo-A L32850_Gogo-A
AY192785-Gorill AY192784-Gorill AY242976 Gorill
AB053570 Gorill AJ289713 GGO289 AF135465 Gorill AF135464 Gorill AF387968 AF3879 AF291825-Gorill
AF252560 Gorill X59619 GGDRAP G
AF177628 Gorill AB042831- Goril
U42763-Gorilla AF195664-Gorill
AB061804 Gorill AB061803 Gorill AB061805 Gorill AB041364-Gorill
U77648 Gorilla AB037485-Gorill AB037479 Gorill
AB042835-Gorill AB041368 Gorill
AJ002050-Gorill Y07988 Gorilla
L38651 GORMHIE L38652 GORMHIF L38647 GORMHIA
L32847 GORMHCIA L32848 GORMHCIA L38649 GORMHIC L32849 GORMHCIA L38648 GORMHIB L38650 GORMHID L32850 GORMHCIA
100
72 71 99
87 95 59 100 48
23 14 3 26
4
7 0
100
100 100 99
28 14
21 12 11 1
6 1
0
0
0.1
AY192785_LPA AY192784_LPA AY242976_APP
B053570_AFP AJ289713_HaA AF135465_UGT1A1 AF135464_UGT1A1 AF387967_MC1P AF291825_VHL
AF252560_CCR5 X59619_DRA
AF177628_RHAG AB042831_MAO A
U42763_TNFA AF195664_TNF
AB061804_5-HTT AB061803_5-HTT AB061805_5-HTT AB041364_INMT
U77648_Prtein C
AB037485_Nerve growth factor
AJ002050_MSH2
AB037479_Brain-2/N-Oct-3 AB042835_Neurofilament M
AB041368_ Differentiation-dependent A4 protein
Y07988_Hunting’s disease L38651_Gogo-A
L38647_Gogo-B L38652_Gogo-B
L32847_Gogo-B L32848_Gogo-B L38649_Gogo-A L32849_Gogo-A L38648_Gogo-A L38650_Gogo-A L32850_Gogo-A
Figure 4.3 The relation of hit numbers and E-values (-log E-value scale) using BLAST program
(A)
CLUSTAL W (1.81) multiple sequence alignment
L32850_Gorilla_gorilla TCTCCGCAGTTTCTCCTCT---TCTCACAACCTGCGTCGGGTCCTTCTTC L38650_Gorilla_gorilla TCTCCGCAGTTTCTCCTCT---TCTCACAACCTGCGTCGGGTCCTTCTTC L38649_Gorilla_gorilla TCTCCTCAGTTTCTCCTCT---TCTCACAACCTGCGTCGGGTCCTTCTTC L38648_Gorilla_gorilla TCTCCGCAGTTTCTCCTCT---TCTCACAACCTGCGTCGGGTCCTTCTTC L32849_Gorilla_gorilla TCTCCGCAGTTTCTCCTCT---TCTCACAACCTGCGTCGGGTCCTTCTTC L38651_Gorilla_gorilla TCTCCGCAGTTTCTCTTCTCCCTCTCCCAACTTATGTAGGGTCCTTCTTC ***** ********* *** **** **** * ** ************
L32850_Gorilla_gorilla CTAGATACTCACGACGCGGACCCAGTTCTCACTGCCATTGGGTGTCGGGT L38650_Gorilla_gorilla CTAGATACTCACGACGCGGACCCAGTTCTCACTGCCATTGGGTGTCGGGT L38649_Gorilla_gorilla CTAGATACTCACGACGCGGTGCCAGTTCTCACTGCCATTGGGTGTCGGGT L38648_Gorilla_gorilla CTAGATACTCACGACGCGGACCCAGTTCTCACTGCCATTGGGTGTCGGGT L32849_Gorilla_gorilla CTAGATACTCACGAAGCGGACCCAGTTCTCACTGCCATTGGGTGTCGGGT L38651_Gorilla_gorilla CTGGACACTCAGGATGTGGACTCAGTTCTCACTCCCATTTGGTGTCGGGT ** ** ***** ** * ** *********** ***** **********
CAAT box TATA box
L32850_Gorilla_gorilla TTCTAGAGAAG-CCAATCAGTGTCATCGCGGT-CCCGGTTCTAAAGTCCC L38650_Gorilla_gorilla TTCTAGAGAAG-CCAATCAGTGTCATCGCGGT-CCCGGTTCTAAAGTCCC L38649_Gorilla_gorilla TTCTAGAGAAGACCAATCAGTGTCATCTCGGTGTCCGGTTCTAAAGTCCC L38648_Gorilla_gorilla TTCTAGAGAAG-CCAATCAGTGTCATCGCGGT-CCCGGTTCTAAAGTCCC L32849_Gorilla_gorilla TTCTAGAGAAG-CCAATCAGTGTCATCGCGGT-CCCGGTTCTAAAGTCCC L38651_Gorilla_gorilla TTCTAGCGAAG-CCAATCGGCGTCGCTGGGGTCCCTGTTCCAGAAGTCCC ****** **** ****** * *** *** * * * * *******
L32850_Gorilla_gorilla CAGGCACCCACCCGGCCTCAGATTCTCCCCAGACGCCCGCG L38650_Gorilla_gorilla CAGGCACCCACCCGGCCTCAGATTCTCCCCAGACGCCCGCG L38649_Gorilla_gorilla CAGGCACCCACCCGGCCTCAGATTCTCTCCAGACACCGAGG L38648_Gorilla_gorilla CAGGCACCCACCCGGCCTCAGATTCTCCCCAGACGCCGAGG L32849_Gorilla_gorilla CAGGGAACCACCCGGACTCAGATTCTCCCCAGACGCCGAGG L38651_Gorilla_gorilla CGCGAACACATTGGGACTCAGATTCTCCCCAGACGCCGAGG * * * ** ** *********** ****** ** *
(B)
Clustal Distance Matrix (1) (2) (3) (4) (5) (2) 0.054
(3) 0.016 0.038 (4) 0.016 0.038 0.000
(5) 0.038 0.059 0.022 0.022 (6) 0.199 0.225 0.183 0.183 0.183 (1) L32850_Gorilla gorilla (2) L38649_Gorilla gorilla
(3) L38650_Gorilla gorilla (4) L38648_Gorilla gorilla (5) L32849_Gorilla gorilla (6) L38651_Gorilla gorilla (C)
Figure 4.4 Three kinds of output results of Gogo-A promoter sequence useing ClustalW program
(A) Intra-specific alignments of promoter sequences of Gogo-A gene from Gorilla.
Intra-specific conserved TATA box and CAAT box were clearly identifiable. (B) The score matrix of pairwise distance shows similarity between two sequences. (C) Phylogenic trees generate by Phylip's Drawgram.
(A)
CLUSTAL W (1.81) multiple sequence alignment
AFP_Pongo_pygmaeus AGTTTGAGGAGAATATTTGTTATATTTGCAAAATAAAATAAGTTTGCAAG AFP_Pan_troglodytes AGTTTGAGGAGAATATTTGTTATATTTGCAAAATAAAATAAGTTTGCAAG AFP_Gorilla_gorilla AGTTTGAGGAGAATATTTGTTATATTTGCAAAATAAAATAAGTTTGCAAG AFP_Homo_sapiens --- AFP_Mus_musculus --- AFP_Pongo_pygmaeus TTTTTTTT-CTGCCCCAAAGAGGTCTGTGTCCTTGAACATAAAATACAAA AFP_Pan_troglodytes TTTTTTTTTCTGCCCCAAAGAGGTCTGTGTCCTTGAACATAAAATACAAA AFP_Gorilla_gorilla TTTTTTTTTCTGCCCCAAAGAGGTCTGTGTCCTTGAACATAAAATACAAA AFP_Homo_sapiens ---CAAAGAGCTCTGTGTCCTTGAACATAAAATACAAA AFP_Mus_musculus ---TCTGAAGTGGTCTTTGTCCTTGAACATAGGATACAAG *** * *** ************** ******
AFP_Pongo_pygmaeus TAACCGCTATGCTGTTAATTATTGGCAAATGTCCCATTTTCAACCTAAGG AFP_Pan_troglodytes TAACCGCTATGCTGTTAATTATTGGCAAATGTCCCATTTTCAACCTAAGG AFP_Gorilla_gorilla TAACCGCTATTCTGTTAATTATTGGCAAATGTCCCATTTTCAACCTAAGG AFP_Homo_sapiens TAACCGCTATGCTGTTAATTATTGGCAAATGTCCCATTTTCAACCTAAGG AFP_Mus_musculus TGACCCCTGCTCTGTTAATTATTGGCAAATTGCCTAACTTCAACGTAAGG * *** ** ******************* ** * ****** *****
AFP_Pongo_pygmaeus AAATACCATAAAGTAACAGATATACCAACAAAAGGTTACTAGTTAACAGG AFP_Pan_troglodytes AAATACCATAAAGTAACAGATATACCAACAAAAGGTTACTAGTTAACAGG AFP_Gorilla_gorilla AAATACCATAAAGTAACAGATATACCAACAAAAGGTTACTAGTTAACAGG AFP_Homo_sapiens AAATACCATAAAGTAACAGATATACCAACAAAAGGTTACTAGTTAACAGG AFP_Mus_musculus AAATA---GAGTCATATGTTTGCTCACTGAAGGTTACTAGTTAACAGG ***** *** * * * * * ** *******************
AFP_Pongo_pygmaeus CATTGCCTGAAAAGAGTATAAAAGAATTTCAGCACGATTTTCC---ATAT AFP_Pan_troglodytes CATTGCCTGAAAAGAGTATAAAAGAATTTCAGCACGATTTTCC---ATAT AFP_Gorilla_gorilla CATTGCCTGAAAAGAGTATAAAAGAATTTCAGCACGATTTTCC---ATAT AFP_Homo_sapiens CATTGCCTGAAAAGAGTATAAAAGAATTTCAGCATGATTTTCC--- AFP_Mus_musculus CATCCCTTAAACAGGATATAAAAGGACTTCAGCAGGACTGCTCGAAACAT *** * * ** ** ******** * ******* ** * *
AFP_Pongo_pygmaeus TCTGCTTCCACCACTGCCAATAACAAAATAACTAGCAACCA AFP_Pan_troglodytes TCTGCTTCCACCACTGCCAATAACAAAATAACTAGCAACCA AFP_Gorilla_gorilla TCTGCTTCCACCACTGCCAATAACAAAATAACTAGCAACCA AFP_Homo_sapiens --- AFP_Mus_musculus CCCACTTCCAGCACTGCCTGCGGTGAAGGAACAAGCAGCCA (B)
Clustal Distance Matrix (1) (2) (3) (4) (1) 0.000
(2) 0.000
(3) 0.003 0.003
(4) 0.011 0.011 0.017 (5) 0.273 0.273 0.268 0.254 (1) Pongo pygmaeus (2) Pan troglodytes
(3) Gorilla gorilla (4) Homo sapiens (5) Mus musculus
(C)
Figure 4.5 Three kinds of output results of AFP promoter sequence using ClustalW program
(A) Inter-specific alignments of promoter sequences of AFP gene from human, chimpanzees, orangutan, gorilla, and mouse. Inter-specific conserved TATA box was clearly identifiable.
(B) The score matrix of pairwise distance show similarity between two sequences. (C) Phylogenic trees generate by Phylip's Drawgram.
Cercopithecus1
The trees were built based to neighbor-joining algorithm and bootstrap on MEGA version 2.1. Evolution distances were estimated by a Kimura two-parameter matrix.
Figure 4.6 Phylogenic trees of lower-similarity genes (BMP2)
Saimiri
Cercopithecus2
0.1
(GPHA)
Colobus Presbytis Pongo
Callicebus Aotus
Tarsius 85
90
56
0.02
(IL-4)
Macaca1 Macaca2 Cercocebus Macaca3 37
(VHL)
Gorilla Papio Pan
Macaca 70
0.05
Figure 4.7 Output format of AFP promoter gene using PCMC program
(AFP) a.
b.
c.
Figure 4.8 Diagram format by a visualization tool for putative conserved regulatory element sites of higher-similarity promoter sequences from PCMC program (a), TRANSFAC
database (b) and multiple sequence alignment (c)
Degrees of difference in the related putative conserved regulatory element sites of promoter sequences, the former AFP, Brain-2 / N-Oct 3, and Nerve growth factor are higher-similarity (99-100%) promoter genes that appear intensive ranges in promoter sequence for putative conserved regulatory elements.
(Brain-2 / N-Oct 3) a.
b.
c.
(Nerve growth factor) a.
b.
c.
(APP) a.
b.
Figure 4.9 Diagram format by a visualization tool for putative conserved regulatory element sites of lower-similarity promoter sequences from PCMC program (a), TRANSFAC database (b)
APP, GPHA, TNFA, and VHL are lower-similarity (28-55%) ones that contain less conserved promoter regulatory elements, and then reveal exiguously non-aligned graph.
But GPHA and TNFA have no putative regulatory elements on the 100% conserved region from TRANSFAC.
(VHL) a.
b.
(GPHA) a.
(TNFA) a.
(a)
(b)
Figure 4.10 The distribution of putative regulatory elements in the conserved region regions (a) and less-conserved region regions (b)
position position
position position
position position
bits bits bits
position position
position position
bits bits
bits bits bits bits bits
our study JASPAR Transcription
factor AP-2
CREB
Sp1
SRF
IRF-2
bits bits
position position
position position
bits bits
c-Myb
GATA-1
Figure 4.11 Comparison of some consensus binding sequences from our study and JASPAR database
position position position
position position position
Regulatory
element NCBIa JASPAR our study
TATA box
CAAT box
Figure 4.12 Comparison of TATA box and CAAT box sequence from NCBI, JASPAR database, and our study
bits bits
a The identifiable TATA and CAAT box sequence of nonhuman primate promoters from NCBI.
bits bits
bits bits
Class Position Frequency Sequence logos Transcription factor*
1 2 3 4 5 6 7 A 26 6 26 T 26 20 4 C 15 4 4 30 G 4 15 4 20 6
AP-1 AP related
1 2 3 4 5 6 A 3 3 8 6 1 53 T 54 3 9 3 C 10 12 17 9 12 G 1 50 43 20 46 12
AP-2
1 2 3 4 5 A 28 24 T 29 C 32 31 3 G 1 4 8
alpha-CBF alpha-IRP
1 2 3 4 5 A 71 3 1 3 4 T 6 61 72 8 6 C 25 8 70 4 G 12 8 8 77
CBF-B
1 2 3 4 5 A 62 10 1 3 4 T 6 53 66 13 6 C 3 24 13 11 4 G 17 1 8 61 74
CCAAT-binding factor CAAT box
related
1 2 3 4 5 A 28 28 T 29 C 32 31 3 G 1 4 4
CDP H1TF2
1 2 3 4 5 A 1 28 24 T 29 C 32 32 3 3 G 2 3 7 8 3
CP1
1 2 3 4 5 A 17 28 24 17 T 29 C 32 31 17 17 3 G 18 4 8
CP2
1 2 3 4 5 A 70 10 5 3 7 T 10 61 74 13 6 C 3 24 13 15 5 G 17 5 8 69 82
CTF
1 2 3 4 5 A 10 16 126 13 T 6 21 29 13 9 C 122 104 10 106 123 G 40 37 13 46 46
CAC-binding factor
1 2 3 4 5 A 12 159 7 1 6 T 16 8 4 14 5 C 149 9 143 164 159 G 11 12 34 9 18
CACCC-binding factor CAC
related
1 2 3 4 5 A 3 155 14 1 8 T 19 18 5 C 160 16 151 167 169 G 16 4 19 5
gammaCAC2
NF related 1 2 3 4
A 50 11 50 386 T 197 27 18 22 C 106 427 383 32 G 129 17 31 42
NF-1
1 2 3 4 5 A 4 7 3 2 65 T 101 62 5 11 8 C 17 57 3 73 51 G 4 115 40 5
NF-1/L
1 2 3 4 5 A 10 6 68 30 3 T 107 85 22 88 41 C 13 44 10 89 G 12 7 52 14 9
NF-E
c-Myc related
1 2 3 4 5 6 A 20 9 T 15 4 5 31 37 4 C 25 6 2 9 3 6 G 10 24 30
c-Myc
F2F related 1 2 3 4 5 6 A 3 34 31 39 34 6 T 31 35 C 3 1 1 G 5 8 10 3 8
F2F
GATA-1 related
1 2 3 4 5 6 A 33 14 81 41 22 46 T 70 42 49 85 52 23 C 29 38 1 43 47 G 7 45 9 4 22 23
GATA-1
GR related 1 2 3 4 5 A 51 134 30 171 45 T 103 14 136 98 23 C 130 62 58 37 177 G 63 137 123 41 102
GR
H4TF-2 related
1 2 3 4 5 A 5 15 1 3 3 T 1 1 36 5 C 11 49 41 G 46 36 4 3
H4TF-2
HiNF-A related
1 2 3 4 5 6 7 A 56 21 52 34 50 18 T 4 3 3 40 C 21 3 18 10 G 5 36 6 3 8 3 33
HiNF-A
LF-A1 related
1 2 3 4 5 A 3 2 2 40 T 1 10 3 C 1 35 4 G 43 46 46 3 1
LF-A1
PEA3 related
1 2 3 4 5 6 A 27 6 3 24 30 31
T 1
C 6 5 7 G 6 33 31 15 9
PEA3
Pit-1 related
1 2 3 4 5 A 28 62 57 71 30 T 46 16 13 16 47 C 9 11 5 16 G 14 8 27 5 4
Pit-1
Sp1 related 1 2 3 4 5 A 33 4 14 132 33 T 125 19 7 4 58 C 102 110 189 134 173 G 34 161 84 24 30
Sp1
*: Data source from TRANSFAC database[35].
Figure 4.13 Sequence representation and position frequency of putative consensus regulatory elements in non-human primates
Position frequencies were counted from the putative binding site sequence in the conversed regions. Sequence logos were used for the regulatory elements sequence as a graphical view.
Appendix I Taxonomy of primates from NCBI
。Primates order
。Catarrhini
。Hominidae family
。Homo genus (人屬)
Homo sapiens species
。Pan (黑猩猩)
Pan troglodytes Pan paniscus
。Gorilla (大猩猩)
Gorilla gorilla
。Pongo (猩猩)
Pongo pygmaeus
。Hylobatidae
。Hylobates (長臂猿)
Hylobates muelleri Hylobates lai
Hylobates sp.
。Cercopithecidae(Old World monkeys)
。Cercopithecinaesubfamily
。Cercocebus (白瞼猴)
Cercocebus torquatus atys
。Cercopithecus (長尾猴)
Cercopithecus aethiops
。Erythrocebus (Patas Monkeys) Erythrocebus patas
。Macaca (獼猴)
Macaca arctoides Macaca fascicularis Macaca mulatta
Macaca nemestrina Macaca radiata
。Mandrillus (forest baboons)
Mandrillus leucophaeus
。Papio (狒狒)
Papio anubis
Papio cynocephalus Papio hamadryas Papio ursinus
。Colobinae subfamily
。Colobus (疣猴)
Colobus guereza
。Presbytis (葉猴)
Presbytis obscura
。Platyrrhini (New World monkeys)
。Callitrichidae family
。Callithrix (狨)
Callithrix penicillata Callithrix jacchus
。Cebidae
。Aotinae (夜猴)
Aotus trivirgatus
。Atelinae (蜘蛛猴)
Ateles geoffroyi
。Callicebinae
Callicebus moloch (Dusky titi )
。Cebinae
Cebus paella (捲尾猴) Saimiri sciureus (松鼠猴)
。Tarsii
。Tarsiidae
。Tarsius (眼鏡猴)
Tarsius bancanus
Appendix II Primate images from Primate Info Neta
Pan troglodytesb (Chimpanzee c ; 黑猩猩)
Pan paniscus (Bonobo ; 侏儒黑猩猩)
Gorilla gorilla (Gorilla ; 大猩猩)
Pongo pygmaeus (Orangutan ; 紅毛猩猩)
Hylobates muelleri (Mueller's gibbon ; 灰長臂猿)
Hylobates lai
(White-handed gibbon ; 白手長臂猿)
Hylobates sp.
(Gibbon ; 長臂猿)
Cercocebus torquatus atys (Sooty mangabey ; 白鬢白眉猴)
Cercopithecus aethiops (African green monkey ; 綠猴)
Erythrocebus patas (Red guenon ; 赤猴)
Macaca arctoides
(Stump-tailed macaque ; 截 尾 獼 猴 )
Macaca fascicularis
(Crab-eating macaque ; 食蟹獼猴)