• 沒有找到結果。

Biol., 2:13

在文檔中 中 華 大 學 (頁 42-96)

[28] Thomas, J.W., Touchman, J.W., Blakesley, R.W., Bouffard, G.G., Beckstrom-Sternberg, S.M., Margulies, E.H., Blanchette, M., Siepel, A.C., Thomas, P.J., McDowell, J.C., Maskeri, B., Hansen, N.F., Schwartz, M.S., Weber, R.J., Kent, W.J., Karolchik, D., Bruen, T.C., Bevan, R., Cutler, D.J., Schwartz, S., Elnitski, L., Idol, J.R., Prasad, A.B., Lee-Lin, S.Q., Maduro, V.V., Summers, T.J., Portnoy, M.E., Dietrich, N.L., Akhter, N., Ayele, K., Benjamin, B., Cariaga, K., Brinkley, C.P., Brooks, S.Y., Granite, S., Guan, X., Gupta, J., Haghighi, P., Ho, S.L., Huang, M.C., Karlins, E., Laric, P.L., Legaspi, R., Lim, M.J., Maduro, Q.L., Masiello, C.A., Mastrian, S.D., McCloskey, J.C., Pearson, R., Stantripop, S., Tiongson, E.E., Tran, J.T., Tsurgeon, C., Vogt, J.L., Walker, M.A., Wetherby, K.D., Wiggins, L.S., Young, A.C., Zhang, L.H., Osoegawa, K., Zhu, B., Zhao, B., Shu, C.L., De, Jong, P.J., Lawrence, C.E., Smit, A.F., Chakravarti, A., Haussler, D., Green, P., Miller, W. and Green, E.D. (2003) Comparative analyses of multi-species sequences from targeted genomic regions. Nature, 424: 788-793.

[29] Lecompte, O., Thompson, J.D., Plewniak, F., Thierry, J. and Poch, O. (2001) Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene, 270: 17-30.

[30] Batzoglou, S., Pachter, L., Mesirov, J. P., Berger, B. and Lander, E. S. (2000) Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res., 10: 950-958.

[31] Chen, R., Bouck, J. B., Weinstock, G. M. and Gibbs, R. A. (2001) Comparing vertebrate whole-genome shotgun reads to the human genome. Genome Res., 11: 1807-1816.

[32] Mouse Genome Sequencing Consortium. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature, 420: 520-562.

[33] Dubchak, I., Brudno, M., Loots, G.G., Pachter, L., Mayor, C., Rubin, E.M. and Frazer, K.A. (2000) Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res., 10: 1304-1306.

[34] Hardison, R.C. (2000) Conserved noncoding sequences are reliable guides to regulatory elements. Trends. Genet., 16: 369-372.

[35] Pennacchio, L.A. and Rubin, E.M. (2001) Genomic strategies to identify mammalian regulatory sequences. Nature Rev. Genet., 2: 100-109.

[36] International Human Genome Sequencing Consortium. (2001) Initial sequencing and analysis of the human genome. Nature, 409: 860-921.

[37] Oliphant, A.R., Brandl, C.J. and Struhl, K. (1989) Defining the sequence specificity of DNA-binding proteins by selecting binding sites from random-sequence oligonucleotides:

analysis of yeast GCN4 protein. Mol. Cell Biol., 9: 2944-9.

[38] Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel,A. E., Kel-Margoulis, O.V., Kloos, D.U., Land, S., Lewicki-Potapov, B., Michael, H., Munch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S. and Wingender, E. (2003) TRANSFAC: transcriptional regulation, from patterns to profiles.

Nucleic. Acids. Res., 31: 374–378.

[39] Ghosh, D. (1993). Status of the transcription factors database (TFD). Nucleic. Acids. Res., 24: 238-241.

[40] Chen, Q.K., Hertz, J.Z. and Stormo, G.D. (1995) MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices. Comp. Appl. Biosciences, 11: 563-566.

[41] Prestridge, D.S. (1991) SIGNAL SCAN: A computer program that scans DNA sequences for eukaryotic transcriptional elements. Comput. Appl. Biosci., 7: 203-206.

[42] Nakabayashi, H., Koyama, Y., Sakai, M., Li, H.M., Wong, N.C. and Nishi, S. (2001) Glucocorticoid stimulates primate but inhibits rodent alpha-fetoprotein gene promoter.

Biochem. Biophys. Res. Commun., 287: 160-172.

[43] Winter, H., Langbein, L., Krawczak, M., Cooper, D.N., Jave-Suarez, L F., Rogers, M.A., Praetzel, S., Heidt, P.J. and Schweizer, J. (2001) Human type I hair keratin pseudogene phi-hHaA has functional orthologs in the chimpanzee and gorilla: evidence for recent inactivation of the human gene after the Pan-Homo divergence. Hum. Genet., 108:

37-42.

[44] Koppe, l.D.A., Wolfe, S.A., Fogelfeld, L.A., Merchant, P,S., Prouty, L. and Grimes, S.R.

(1994) Primate testicular histone H1t genes are highly conserved and the human H1t gene is located on chromosome 6. J. Cell Biochem., 54: 219-230.

[45] Vallejo, A.N. and Pease, L.R. (1995) Structure of the MHC A and B locus promoters in hominoids. Insights on the evolution of the class I MHC multigene family. J. Immunol., 154: 3912-3921.

[46] Clarimon, J., Andres, A.M., Bertranpetit, J. and Comas, D. (2004) Comparative analysis of Alu insertion sequences in the APP 5' flanking region in humans and other primates. J.

Mol. Evol., 58: 722-31.

[47] Mummidi, S., Bamshad, M., Ahuja, S.S., Gonzalez, E., Feuillet, P.M. Begum, K., Galvis, M.C., Kostecki, V., Valente, A.J., Murthy, K.K., Haro, L., Dolan, M.J., Allan, J.S. and Ahuja, S.K. (2000) Evolution of human and non-human primate CC chemokine receptor 5 gene and mRNA. Potential roles for haplotype and mRNA diversity, differential haplotype-specific transcriptional activity, and altered transcription factor binding to polymorphic nucleotides in the pathogenesis of HIV-1 and simian immunodeficiency virus. J. Biol. Chem., 275:18946-61.

[48] Yoshimura, K., Nakamura, H., Trapnell, B.C., Dalemans, W., Pavirani, A., Lecocq, J.P.

and Crystal, R.G. (1991) The cystic fibrosis gene has a 'housekeeping'-type promoter and is expressed at low levels in cells of epithelial origin. J. Biol. Chem., 266: 9140-9144.

[49] Reitsma, P.H., Bertina, R.M., Ploos van Amstel, J.K., Riemens,A. and Briet, E. (1988) The putative factor IX gene promoter in hemophilia B Leyden. Blood, 72: 1074-1076.

[50] Clarimon, J., Andres, A.M., Bertranpetit, J. and Comas, D. (1996) Isolation and characterization of the human mismatch repair gene hMSH2 promoter region. Hum.

Genet., 97: 114-116.

[51] Sugawara, T., Lin, D., Holt, J.A., Martin, K.O., Javitt, N.B., Miller, W.L. and Strauss, J.F.

III (1995) Structure of the human steroidogenic acute regulatory (StAR) gene: StAR stimulates mitochondrial cholesterol 27-hydroxylase activity. Biochemistry, 34:

12506-12512.

[52] Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic. Acids. Res., 22:

4673-4680.

[53] Felsenstein, J. (1989) PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5:

164-166.

[54] Horiike, T., Hamada, K., Kanaya, S. and Shinozawa, T. (2001). Origin of eukaryotic nuclei by symbiosis of Archaea in Bacteria is revealed by homology-hit analysis. Nat.

Cell Biol. 3: 210-214.

[55] Pai, T.-W., Chang, W.-Y., Chang, M.D.-T., Chu, J.-H. and Tai, H.L. (2004) Ladderlike Stepping and Interval Jumping Searching Algorithm for DNA Sequences. In Proc.

Second Asia-Pacific Bioinformatics Conference (APBC2004), 29: 93-98.

[56] Crooks, G.E., Hon, G., Chandonia, J.M. and Brenner, S.E. (2004) WebLogo: A sequence logo generator. Genome Res., 14: 1188-1190.

[57] Schneider, T.D. and Stephens, R.M. (1990) Sequence Logos: A New Way to Display Consensus Sequences. Nucleic. Acids. Res., 18: 6097-6100.

[58] Quandt, K., Frech, K., Karas, H., Wingender, E. and Werner, T. (1995) MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic. Acids. Res. 23: 4878-4884.

[59] Chen, Q.K. Hertz, G.Z. and Stormo, G.D. (1995) Matrix search 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices. Comput. Appl. Biosci., 11: 563-566.

[60] Felsenstein, J. (1989) PHYLIP -- Phylogeny Inference Package (Version 3.2). Cladistics, 5: 164-166.

[61] Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W.W. and Lenhard, B. (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles.

Nucleic. Acids. Res., 32: D91-4.

[62] Ludwig, M.Z., Bergman, C., Patel, N.H. and Kreitman, M. (2000) Evidence for stabilizing selection in a eukaryotic enhancer element. Nature, 403, 564-567.

[63] Grabe, N. (2002) AliBaba2: Context Specific Identification of Transcription Factor Binding Sites. In Silico. Biol., 2: S1-1.

[64] Schug J. and Overton, G.C. (1997) TESS: Transcription Element Search Software on the WWW in Technical Report CBIL-TR-1997-1001-v0.0, of the Computational Biology and Informatics Laboratory, School of Medicine, University of Pennsylvania.

[65] Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic. Acids. Res., 25, 3389-3402.

Table 2.1 Prediction programs of transcription elements binding sites

Program Operating principle Technical data URL Reference

AliBaba2 It predicts of transcription factor binding sites by constructing matrices on the fly from TRANSFAC 4.0 public.

The construct matrices are from all binding sites of TRANSFAC database [38].

http://www.alibaba2.

com/

63

Match 1.0 (Matrix search)

The MonoMatch Profilers and DiMatch Profilers provide mean for creating (editing, deleting) matrix profiles - specific subsets of weight matrixes with defined cut-offs.

Both profilers are the library of mono- and di-nucleotide weight matrixes from

TRANSFAC 3.5[38].

http://compel.bionet.

nsc.ru/Match/Match.

html

40

MatInspector It utilizes a library of matrix descriptions for transcription factor binding sites to locate matches in sequences of unlimited length.

The matrix family library contains 592 weight matrices in six groups of TRANSFAC database [38].

http://www.genomati x.de/products/MatIns pector/

58

Signal Scan (Signal

Sequence Scan)

The signal database source is used to search transcription factor binding sites.

The signal database source is derived from TFDa, TRANSFAC[38] and IMDb Matrix databases.

http://bimas.dcrt.nih.

gov/molbio/signal/

41

TESS

(Transcription Element Search System)

A web tool for identifying binding sites using site, consensus strings, and

positional weight matrices from databases.

Data source from the TRANSFAC[38], IMDb, and CBIL-GibbsMatc database.

http://www.cbil.upen n.edu/tess

64

TFBLAST A tool for predicting in the TRANSFAC Factor Table using BLAST (BLASTX and BLASTP) algorithm.

Data source from the TRANSFAC database[35].

http://www.gene-regu lation.com/pub/progr ams.html

65

a Transcription Factor Database (TFD) maintained by David Ghosh [39] is available at GCG and NIH.

b Information Matrix Database (IMD) maintained by Dr. Qing Chen [40] is available from http://bimas.dcrt.nih.gov/molbio/signal/

c CBIL-GibbsMat database contains weight matrices built from TRANSFAC database and is available from http://www.cbil.upenn.edu/tess.

Table 3.1 Summary of the primate, human and rodent promoter sequences

Gene name Primate Human Rodents Accession number

5-HTT Hylobates muelleri(5)

AB061809 AB061810 AB061811 AB061812 AB061813

Gorilla gorilla(3)

AB061803 AB061804 AB061805

Pan troglodytes AB061802

Pongo pygmaeus(3)

AB061806 AB061807 AB061808

AB061799 AB061800 AB061801

11 beta hydroxylase Papio ursinus U52081

AFP Gorilla gorilla AB053570

Pan troglodytes AB053571

Pongo pygmaeus AB053569

AB053572 L34019 Z19532

a AB053573

b AB053574

Apolipoprotein (a) Pan troglodytes AY028467

APP Erythrocebus patas AY242978

Gorilla gorilla AY242976

Macaca mulatta AF067971

Mandrillus leucophaeus AY242979

Pan troglodytes AY242975

Pongo pygmaeus AY242977

○ AY242974

BMP2 Cercopithecus aethiops(2) AY494189

AY494190

Saimiri sciureus AY494188

○ NM_001200

a AF074942

Brain-2 / N-Oct 3 Gorilla gorilla AB037479

Pan troglodytes(2) AB037477

AB037478

Pongo pygmaeus AB037480

C-III Macaca fascicularis X77900

CCR2 Macaca mulatta AY083266

Macaca radiata AY083265

○ AF068265

CCR5 Ateles geoffroyi AF252595

Aotus trivirgatus AF252594

Callithrix jacchus AF252597

Cercocebus torquatus atys AF115964

Colobus guereza AF252570

AF252569

Erythrocebus patas AF252573

Gorilla gorilla AF252560

Macaca fascicularis AF252565

Macaca mulatto(2) AF252567

AF252568

Macaca nemestrina(2) AF115963

AF252566

Pan troglodytes(5)

AF109384 AF252556 AF252557 AF252558 AF252559

Papio anubis(3)

AF252562 AF252563 AF252564

Pongo pygmaeus AF252561

○ AF246924

AF032132

CFTR Hylobates lai X95930

Macaca fascicularis X95929

Saimiri sciureus X95928

○ M58478

Cytochrome p450

2E1 Erythrocebus patas U82752

Macaca fascicularis AF053081

○ D10014

DARC Macaca fascicularis AF320031

Macaca mulatta AF320032

Differentiation- dependent A4 protein

Gorilla gorilla AB041368

Pan troglodytes AB041367

Pongo pygmaeus AB041369

DPYD Macaca fascicularis AF216268

Macaca mulatta AF216267

DQB1 Pan troglodytes(8)

AF098646 AF098647 AF098648 AF098649 AF098650 AF098651 AF098652 AF098653

DQB2 Pan troglodytes(5)

AF098654 AF098655 AF098656 AF098657 AF098658

○ AF098660

DRA Aotus trivirgatus X5962

Gorilla gorilla X59619

Macaca fascicularis X59621

Pan troglodytes X57759

Papio hamadryas X59620

Epsilon-globin Pongo pygmaeus X05035

Factor IX Macaca fascicularis(2) X54634

X65473

Pan troglodytes X65472

○ X55008

Fmr1 Macaca arctoides AF251350

Pan troglodytes AF251349

Gogo-A Gorilla gorilla(6)

L32849 L32850 L38648 L38649 L38650 L38651

Gogo-B Gorilla gorilla(4)

L32847 L32848 L38647 L38652

GPHA Aotus trivirgatus AF401995

Callicebus moloch AF401996

Colobus guereza AF401993

Pongo pygmaeus AF401992

Presbytis obscura AF401994

Tarsius bancanus AF401997

HaA Gorilla gorilla AJ289713

Pan troglodytes AJ289712

Histone H1t Macaca mulatta M97756

Huntington's

Disease Gorilla gorilla Y07988

Pan troglodytes(2) Y07989

Y07990

IFN-gamma Callithrix jacchus X64659

Cercocebus torquatus atys AY486429

Macaca mulatta AY486428

IGFBP-1 Papio anubis AY095345

IL-4 Cercocebus torquatus atys AY486435

Macaca mulatto(2) AY083267

AY486434

Macaca radiata AY083268

IL-10 Cercocebus torquatus atys AY486432

Macaca mulatta AY486436

IL-12 p40 Macaca mulatta AY486436

INMT Gorilla gorilla AB041364

Pan troglodytes AB041363

Pongo pygmaeus AB041365

LCT Pan troglodytes AF282888

LPA Cercopithecus aethiops AY192774

Colobus guereza

kikuyuensis AY192776

Erythrocebus patas AY192775

Gorilla gorilla(2) AY192784

AY192785

Macaca mulatta AY192772

Macaca nemestrina AY192771

Pan troglodytes AY192769

Pan paniscus AY192773

Papio hamadryas AY192770

MAO A Gorilla gorilla AB042831

Macaca mulatta AJ544234

Pan troglodytes AB042830

Pongo pygmaeus AB042832

MCP1 Callithrix jacchus AF493701

Macaca radiata AF493700

Papio hamadryas AF493699

○ AF493697

AF493698

MC1R Gorilla gorilla AF387968

Pan troglodytes AF387969

MID1 Callithrix jacchus(2) AY112908

AY112912

MSH2 Callithrix penicillata AJ002053

Cercopithecus patas AJ002049

Gorilla gorilla AJ002050

Pan troglodytes AJ002051

Pongo pygmaeus AJ002052

○ U23824

Nerve growth

factor Gorilla gorilla AB037485

Pan troglodytes(2) AB037483

AB037484

Pongo pygmaeus AB037486

Neurofilament M Gorilla gorilla AB042835

Pan troglodytes AB042834

Pongo pygmaeus AB042836

○ AB042833

Paan-AG Papio anubis anubis(4) AY434097

AY434102 AY434094 AY434096

Patr-A Pan troglodytes L32846

L32856 L32857

Patr-B Pan troglodytes(2) L32845

L32855

PON1 Pongo pygmaeus AB089303

Pan troglodytes AB089302

Popy-A Pongo pygmaeus(4) L32843

L32844 L32853 L32854

Popy-B Pongo pygmaeus(2) L32842

L32858

protein C Callithrix jacchus U77649

Cebus apella U77652

Gorilla gorilla U77648

Macaca fascicularis U77654

Macaca mulatta U77651

Pan troglodytes U77647

Papio hamadryas U77646

Pongo pygmaeus U77650

b U77653

PSP94 Papio hamadryas U64888

RB1 Pan troglodytes AF336015

RHAG Gorilla gorilla AF177628

Hylobates sp. AF177630

Macaca mulatta AF177631

Pan troglodytes AF177627

Papio hamadryas AF177632

Pongo pygmaeus AF177629

○ AF178844

a AB036994

SLC6A4 Macaca mulatta AF191557

StAR Macaca mulatta AY007224

U29098

TNF Aotus trivirgatus AF195668

Gorilla gorilla AF195664

Hylobates lai AF195666

Macaca mulatta AF195667

Pan troglodytes AF195663

Pongo pygmaeus AF195665

Saimiri sciureus AY208942

TNFA Cercocebus torquatus atys AY486431

Colobus guereza U42765

Gorilla gorilla U42763

Macaca mulatta AY486430

Pan troglodytes U42626

Papio ursinus AF027198

Pongo pygmaeus U42764

○ L11698

U42625

UGT1A1 Colobus guereza AF135469

Gorilla gorilla(2) AF135465

AF135464

Pan paniscus AF135462

Pan troglodytes AF135463

Papio cynocephalus AF135468

Pongo pygmaeus AF135466

○ AF180372

VHL Gorilla gorilla AF291825

Macaca fascicularis AF291827

Pan troglodytes AF291824

Papio anubis AF291826

○ AF010238

XRCC1 Papio hamadryas AF019114

a: Mus musculus

b: Rattus norvegicus

Multiple sequence alignments Obtain human and

rodent promoter sequences

Screen TRANSFAC database Retrieve primate

promoter sequences from NCBI

Perform PCMC program

Create sequence logos Develop a

visualization tool

Detect putative promoter regulatory elements

Homology-hit analysis using BLASTn

Figure 3.1 The flowchart of this study

Figure 3.2 The procedure of visualization tool for transcription elements binding site

The overall execution procedure is delivered as following:

1. Choose the input channel 2 for input data.

2. Manipulate the segmental sequence input.

3. Read through the selected file (neglect the comment lines starting with “>” or space lines; filter the others for the strings ATCG).

4. Check whether the segmental sequence input appears in the selected file.

5. Generate the results in graphical format.

The output format is shown as the following that the starting coordinates of each segmental sequence marked on the input sequences.

Figure 3.3 The output format

Table 4.1 The divergence of substitution of sequence in the nonhuman primate promoter genes

Genetic distances (transitions + transversions) Gene name Kimura two-parameter p-distance Same gene same species DQB1 d = 0.052, SD* = 0.007 d = 0.049, SD = 0.006

UGT1A d = 0.060, SD = 0.017 d = 0.056, SD = 0.015 TNF d = 0.067, SD = 0.006 d = 0.062, SD = 0.006 Protein C d = 0.062, SD = 0.007 d = 0.058, SD = 0.007 LPA d = 0.057, SD = 0.004 d = 0.054, SD = 0.004 5-HTT d = 0.071, SD = 0.011 d = 0.066, SD = 0.010 Same gene different species

CCR5 d = 0.051, SD = 0.005 d = 0.048, SD = 0.005

Mean distance d =0.061, SD =0.008 d = 0.057, SD = 0.007

* SD: standard derivation

Table 4.2 Summary of the data of putative binding sites identified from the TRANSFAC database

The putative binding sites of genes identified from the TRANSFAC database were counted in the both regions. Genes with only one sequence published were excluded.

Conserved region Less-conserved region Gene name

binding site binding factor binding site binding factor

Sequence number

11 beta hydroxylase (20,18)* 1

5-HTT 5 3 73 6 12

AFP 29 23 0 0 3

Apolipoprotein (a) (84, 0) 1

APP 8 4 357 69 6

BMP2 0 0 150 26 3

Brain-2 / N-Oct 3 50 16 16 6 4

C-III (95, 29) 1

CCR2 13 10 1 1 2

CCR5 0 0 20 14 22

CFTR 57 28 132 34 3

Cytochrome p450

2E1 homolog 9 5 45 24 2

DARC 11 5 0 0 2

Differentiation-

dependent A4 protein 40 23 2 2 3

DPYD 86 32 0 0 2

DQB1 30 19 20 14 8

DQB2 31 17 6 6 5

DRA 46 29 6 6 5

Epsilon-globin (229, 42) 1

Factor IX 8 7 3 3 3

Fmr1 24 10 8 6 2

Gogo-A 11 11 26 13 6

Gogo-B 28 20 5 4 4

GPHA 0 0 31 22 6

HaA 42 21 11 8 2

Histone H1t (41, 12) 1

Huntington's Disease 23 14 0 0 3

IFN-gamma 32 19 107 32 3

IGFBP-1 (242, 44) 1

IL-10 75 28 1 1 2

IL-12 p40 (34, 15) 1

IL-4 24 17 35 13 4

INMT 9 7 3 3 3

LCT (43, 19) 1

LPA 19 9 128 41 10

MAO A 43 16 57 19 4

MCP1 9 3 17 12 3

MC1R 368 50 66 22 2

MID1 0 0 16 14 2

MSH2 22 10 37 14 5

Nerve growth factor 23 12 14 6 4

Neurofilament M 19 11 13 4 3

Paan-AG 15 8 98 37 4

Patr-A 23 18 4 3 3

Patr-B 29 21 1 1 2

PON1 72 28 6 4 2

Popy-A 19 16 13 10 4

Popy-B 29 21 2 2 2

Protein C 12 7 24 13 8

PSP94 (60, 28) 1

RB1 (161, 36) 1

RHAG 21 8 47 22 6

SLC6A4 74(9) 1

Steroidogenic acute

regulatory protein (82, 21) 1

TNF 14 8 61 21 7

TNFA 0 0 144 35 7

UGT1A1 2 2 17 13 7

VHL 8 6 31 19 4

XRCC1 (182, 33) 1

Total 1439 82 1854 89 222

*: (binding site numbers, binding factor numbers)

(a)

Gene Name

IGFBP -1 LCT XRCC1 epsilon-globin P SP 94 DARC P at r-B MID1 DP YD BMP 2 IFN-gamma neurofilament M fact or IX Gogo-B VHL DQB2 RHAG Gogo-A T NF LP A

Number

25 20

15 10

5 0

(b)

Species T arsius bancanus Mandrillus leucophae At eles geoffroyi P resbyt is obscura Macaca arct oides Cebus apella Hylobat es lai Colobus guereza kiku Macaca radiat a Saimiri sciureus Colobus guereza Aot us t rivirgat us P apio anubis Callit hrix jacchus Macaca fascicularis P ongo pygmaeus P an t roglodyt es

Number

55 50 45 40 35 30 25 20 15 10 5 0

Figure 4.1 (a) The distribution of numbers of species in genes available for nonhuman primates. (b) The distribution of numbers of genes available for nonhuman primates

AY192785-Gorill AY192784-Gorill AY242976 Gorill

AB053570 Gorill AJ289713 GGO289 AF135465 Gorill AF135464 Gorill AF387968 AF3879 AF291825-Gorill

AF252560 Gorill X59619 GGDRAP G

AF177628 Gorill AB042831- Goril

U42763-Gorilla AF195664-Gorill

The trees were built based to neighbor-joining algorithm and bootstrap on MEGA version 2.1. Evolution distances were estimated by a p-distance matrix.

Figure 4.2 Phylogenic trees of Gorilla

AB061804 Gorill AB061803 Gorill AB061805 Gorill AB041364-Gorill

U77648 Gorilla AB037485-Gorill AB037479 Gorill

AB042835-Gorill AB041368 Gorill

AJ002050-Gorill Y07988 Gorilla

L38651 GORMHIE L38652 GORMHIF L38647 GORMHIA

L32847 GORMHCIA L32848 GORMHCIA L38649 GORMHIC L32849 GORMHCIA L38648 GORMHIB L38650 GORMHID L32850 GORMHCIA

100

72 71 99

87 95 59 100 48

23 14 3 26

4

7 0

100

100 100 99

28 14

21 12 11 1

6 1

0

0

0.1

AY192785_LPA AY192784_LPA AY242976_APP

B053570_AFP AJ289713_HaA AF135465_UGT1A1 AF135464_UGT1A1 AF387967_MC1P AF291825_VHL

AF252560_CCR5 X59619_DRA

AF177628_RHAG AB042831_MAO A

U42763_TNFA AF195664_TNF

AB061804_5-HTT AB061803_5-HTT AB061805_5-HTT AB041364_INMT

U77648_Prtein C

AB037485_Nerve growth factor

AJ002050_MSH2

AB037479_Brain-2/N-Oct-3 AB042835_Neurofilament M

AB041368_ Differentiation-dependent A4 protein

Y07988_Hunting’s disease L38651_Gogo-A

L38647_Gogo-B L38652_Gogo-B

L32847_Gogo-B L32848_Gogo-B L38649_Gogo-A L32849_Gogo-A L38648_Gogo-A L38650_Gogo-A L32850_Gogo-A

AY192785-Gorill AY192784-Gorill AY242976 Gorill

AB053570 Gorill AJ289713 GGO289 AF135465 Gorill AF135464 Gorill AF387968 AF3879 AF291825-Gorill

AF252560 Gorill X59619 GGDRAP G

AF177628 Gorill AB042831- Goril

U42763-Gorilla AF195664-Gorill

AB061804 Gorill AB061803 Gorill AB061805 Gorill AB041364-Gorill

U77648 Gorilla AB037485-Gorill AB037479 Gorill

AB042835-Gorill AB041368 Gorill

AJ002050-Gorill Y07988 Gorilla

L38651 GORMHIE L38652 GORMHIF L38647 GORMHIA

L32847 GORMHCIA L32848 GORMHCIA L38649 GORMHIC L32849 GORMHCIA L38648 GORMHIB L38650 GORMHID L32850 GORMHCIA

100

72 71 99

87 95 59 100 48

23 14 3 26

4

7 0

100

100 100 99

28 14

21 12 11 1

6 1

0

0

0.1

AY192785_LPA AY192784_LPA AY242976_APP

B053570_AFP AJ289713_HaA AF135465_UGT1A1 AF135464_UGT1A1 AF387967_MC1P AF291825_VHL

AF252560_CCR5 X59619_DRA

AF177628_RHAG AB042831_MAO A

U42763_TNFA AF195664_TNF

AB061804_5-HTT AB061803_5-HTT AB061805_5-HTT AB041364_INMT

U77648_Prtein C

AB037485_Nerve growth factor

AJ002050_MSH2

AB037479_Brain-2/N-Oct-3 AB042835_Neurofilament M

AB041368_ Differentiation-dependent A4 protein

Y07988_Hunting’s disease L38651_Gogo-A

L38647_Gogo-B L38652_Gogo-B

L32847_Gogo-B L32848_Gogo-B L38649_Gogo-A L32849_Gogo-A L38648_Gogo-A L38650_Gogo-A L32850_Gogo-A

Figure 4.3 The relation of hit numbers and E-values (-log E-value scale) using BLAST program

(A)

CLUSTAL W (1.81) multiple sequence alignment

L32850_Gorilla_gorilla TCTCCGCAGTTTCTCCTCT---TCTCACAACCTGCGTCGGGTCCTTCTTC L38650_Gorilla_gorilla TCTCCGCAGTTTCTCCTCT---TCTCACAACCTGCGTCGGGTCCTTCTTC L38649_Gorilla_gorilla TCTCCTCAGTTTCTCCTCT---TCTCACAACCTGCGTCGGGTCCTTCTTC L38648_Gorilla_gorilla TCTCCGCAGTTTCTCCTCT---TCTCACAACCTGCGTCGGGTCCTTCTTC L32849_Gorilla_gorilla TCTCCGCAGTTTCTCCTCT---TCTCACAACCTGCGTCGGGTCCTTCTTC L38651_Gorilla_gorilla TCTCCGCAGTTTCTCTTCTCCCTCTCCCAACTTATGTAGGGTCCTTCTTC ***** ********* *** **** **** * ** ************

L32850_Gorilla_gorilla CTAGATACTCACGACGCGGACCCAGTTCTCACTGCCATTGGGTGTCGGGT L38650_Gorilla_gorilla CTAGATACTCACGACGCGGACCCAGTTCTCACTGCCATTGGGTGTCGGGT L38649_Gorilla_gorilla CTAGATACTCACGACGCGGTGCCAGTTCTCACTGCCATTGGGTGTCGGGT L38648_Gorilla_gorilla CTAGATACTCACGACGCGGACCCAGTTCTCACTGCCATTGGGTGTCGGGT L32849_Gorilla_gorilla CTAGATACTCACGAAGCGGACCCAGTTCTCACTGCCATTGGGTGTCGGGT L38651_Gorilla_gorilla CTGGACACTCAGGATGTGGACTCAGTTCTCACTCCCATTTGGTGTCGGGT ** ** ***** ** * ** *********** ***** **********

CAAT box TATA box

L32850_Gorilla_gorilla TTCTAGAGAAG-CCAATCAGTGTCATCGCGGT-CCCGGTTCTAAAGTCCC L38650_Gorilla_gorilla TTCTAGAGAAG-CCAATCAGTGTCATCGCGGT-CCCGGTTCTAAAGTCCC L38649_Gorilla_gorilla TTCTAGAGAAGACCAATCAGTGTCATCTCGGTGTCCGGTTCTAAAGTCCC L38648_Gorilla_gorilla TTCTAGAGAAG-CCAATCAGTGTCATCGCGGT-CCCGGTTCTAAAGTCCC L32849_Gorilla_gorilla TTCTAGAGAAG-CCAATCAGTGTCATCGCGGT-CCCGGTTCTAAAGTCCC L38651_Gorilla_gorilla TTCTAGCGAAG-CCAATCGGCGTCGCTGGGGTCCCTGTTCCAGAAGTCCC ****** **** ****** * *** *** * * * * *******

L32850_Gorilla_gorilla CAGGCACCCACCCGGCCTCAGATTCTCCCCAGACGCCCGCG L38650_Gorilla_gorilla CAGGCACCCACCCGGCCTCAGATTCTCCCCAGACGCCCGCG L38649_Gorilla_gorilla CAGGCACCCACCCGGCCTCAGATTCTCTCCAGACACCGAGG L38648_Gorilla_gorilla CAGGCACCCACCCGGCCTCAGATTCTCCCCAGACGCCGAGG L32849_Gorilla_gorilla CAGGGAACCACCCGGACTCAGATTCTCCCCAGACGCCGAGG L38651_Gorilla_gorilla CGCGAACACATTGGGACTCAGATTCTCCCCAGACGCCGAGG * * * ** ** *********** ****** ** *

(B)

Clustal Distance Matrix (1) (2) (3) (4) (5) (2) 0.054

(3) 0.016 0.038 (4) 0.016 0.038 0.000

(5) 0.038 0.059 0.022 0.022 (6) 0.199 0.225 0.183 0.183 0.183 (1) L32850_Gorilla gorilla (2) L38649_Gorilla gorilla

(3) L38650_Gorilla gorilla (4) L38648_Gorilla gorilla (5) L32849_Gorilla gorilla (6) L38651_Gorilla gorilla (C)

Figure 4.4 Three kinds of output results of Gogo-A promoter sequence useing ClustalW program

(A) Intra-specific alignments of promoter sequences of Gogo-A gene from Gorilla.

Intra-specific conserved TATA box and CAAT box were clearly identifiable. (B) The score matrix of pairwise distance shows similarity between two sequences. (C) Phylogenic trees generate by Phylip's Drawgram.

(A)

CLUSTAL W (1.81) multiple sequence alignment

AFP_Pongo_pygmaeus AGTTTGAGGAGAATATTTGTTATATTTGCAAAATAAAATAAGTTTGCAAG AFP_Pan_troglodytes AGTTTGAGGAGAATATTTGTTATATTTGCAAAATAAAATAAGTTTGCAAG AFP_Gorilla_gorilla AGTTTGAGGAGAATATTTGTTATATTTGCAAAATAAAATAAGTTTGCAAG AFP_Homo_sapiens --- AFP_Mus_musculus --- AFP_Pongo_pygmaeus TTTTTTTT-CTGCCCCAAAGAGGTCTGTGTCCTTGAACATAAAATACAAA AFP_Pan_troglodytes TTTTTTTTTCTGCCCCAAAGAGGTCTGTGTCCTTGAACATAAAATACAAA AFP_Gorilla_gorilla TTTTTTTTTCTGCCCCAAAGAGGTCTGTGTCCTTGAACATAAAATACAAA AFP_Homo_sapiens ---CAAAGAGCTCTGTGTCCTTGAACATAAAATACAAA AFP_Mus_musculus ---TCTGAAGTGGTCTTTGTCCTTGAACATAGGATACAAG *** * *** ************** ******

AFP_Pongo_pygmaeus TAACCGCTATGCTGTTAATTATTGGCAAATGTCCCATTTTCAACCTAAGG AFP_Pan_troglodytes TAACCGCTATGCTGTTAATTATTGGCAAATGTCCCATTTTCAACCTAAGG AFP_Gorilla_gorilla TAACCGCTATTCTGTTAATTATTGGCAAATGTCCCATTTTCAACCTAAGG AFP_Homo_sapiens TAACCGCTATGCTGTTAATTATTGGCAAATGTCCCATTTTCAACCTAAGG AFP_Mus_musculus TGACCCCTGCTCTGTTAATTATTGGCAAATTGCCTAACTTCAACGTAAGG * *** ** ******************* ** * ****** *****

AFP_Pongo_pygmaeus AAATACCATAAAGTAACAGATATACCAACAAAAGGTTACTAGTTAACAGG AFP_Pan_troglodytes AAATACCATAAAGTAACAGATATACCAACAAAAGGTTACTAGTTAACAGG AFP_Gorilla_gorilla AAATACCATAAAGTAACAGATATACCAACAAAAGGTTACTAGTTAACAGG AFP_Homo_sapiens AAATACCATAAAGTAACAGATATACCAACAAAAGGTTACTAGTTAACAGG AFP_Mus_musculus AAATA---GAGTCATATGTTTGCTCACTGAAGGTTACTAGTTAACAGG ***** *** * * * * * ** *******************

AFP_Pongo_pygmaeus CATTGCCTGAAAAGAGTATAAAAGAATTTCAGCACGATTTTCC---ATAT AFP_Pan_troglodytes CATTGCCTGAAAAGAGTATAAAAGAATTTCAGCACGATTTTCC---ATAT AFP_Gorilla_gorilla CATTGCCTGAAAAGAGTATAAAAGAATTTCAGCACGATTTTCC---ATAT AFP_Homo_sapiens CATTGCCTGAAAAGAGTATAAAAGAATTTCAGCATGATTTTCC--- AFP_Mus_musculus CATCCCTTAAACAGGATATAAAAGGACTTCAGCAGGACTGCTCGAAACAT *** * * ** ** ******** * ******* ** * *

AFP_Pongo_pygmaeus TCTGCTTCCACCACTGCCAATAACAAAATAACTAGCAACCA AFP_Pan_troglodytes TCTGCTTCCACCACTGCCAATAACAAAATAACTAGCAACCA AFP_Gorilla_gorilla TCTGCTTCCACCACTGCCAATAACAAAATAACTAGCAACCA AFP_Homo_sapiens --- AFP_Mus_musculus CCCACTTCCAGCACTGCCTGCGGTGAAGGAACAAGCAGCCA (B)

Clustal Distance Matrix (1) (2) (3) (4) (1) 0.000

(2) 0.000

(3) 0.003 0.003

(4) 0.011 0.011 0.017 (5) 0.273 0.273 0.268 0.254 (1) Pongo pygmaeus (2) Pan troglodytes

(3) Gorilla gorilla (4) Homo sapiens (5) Mus musculus

(C)

Figure 4.5 Three kinds of output results of AFP promoter sequence using ClustalW program

(A) Inter-specific alignments of promoter sequences of AFP gene from human, chimpanzees, orangutan, gorilla, and mouse. Inter-specific conserved TATA box was clearly identifiable.

(B) The score matrix of pairwise distance show similarity between two sequences. (C) Phylogenic trees generate by Phylip's Drawgram.

Cercopithecus1

The trees were built based to neighbor-joining algorithm and bootstrap on MEGA version 2.1. Evolution distances were estimated by a Kimura two-parameter matrix.

Figure 4.6 Phylogenic trees of lower-similarity genes (BMP2)

Saimiri

Cercopithecus2

0.1

(GPHA)

Colobus Presbytis Pongo

Callicebus Aotus

Tarsius 85

90

56

0.02

(IL-4)

Macaca1 Macaca2 Cercocebus Macaca3 37

(VHL)

Gorilla Papio Pan

Macaca 70

0.05

Figure 4.7 Output format of AFP promoter gene using PCMC program

(AFP) a.

b.

c.

Figure 4.8 Diagram format by a visualization tool for putative conserved regulatory element sites of higher-similarity promoter sequences from PCMC program (a), TRANSFAC

database (b) and multiple sequence alignment (c)

Degrees of difference in the related putative conserved regulatory element sites of promoter sequences, the former AFP, Brain-2 / N-Oct 3, and Nerve growth factor are higher-similarity (99-100%) promoter genes that appear intensive ranges in promoter sequence for putative conserved regulatory elements.

(Brain-2 / N-Oct 3) a.

b.

c.

(Nerve growth factor) a.

b.

c.

(APP) a.

b.

Figure 4.9 Diagram format by a visualization tool for putative conserved regulatory element sites of lower-similarity promoter sequences from PCMC program (a), TRANSFAC database (b)

APP, GPHA, TNFA, and VHL are lower-similarity (28-55%) ones that contain less conserved promoter regulatory elements, and then reveal exiguously non-aligned graph.

But GPHA and TNFA have no putative regulatory elements on the 100% conserved region from TRANSFAC.

(VHL) a.

b.

(GPHA) a.

(TNFA) a.

(a)

(b)

Figure 4.10 The distribution of putative regulatory elements in the conserved region regions (a) and less-conserved region regions (b)

position position

position position

position position

bits bits bits

position position

position position

bits bits

bits bits bits bits bits

our study JASPAR Transcription

factor AP-2

CREB

Sp1

SRF

IRF-2

bits bits

position position

position position

bits bits

c-Myb

GATA-1

Figure 4.11 Comparison of some consensus binding sequences from our study and JASPAR database

position position position

position position position

Regulatory

element NCBIa JASPAR our study

TATA box

CAAT box

Figure 4.12 Comparison of TATA box and CAAT box sequence from NCBI, JASPAR database, and our study

bits bits

a The identifiable TATA and CAAT box sequence of nonhuman primate promoters from NCBI.

bits bits

bits bits

Class Position Frequency Sequence logos Transcription factor*

1 2 3 4 5 6 7 A 26 6 26 T 26 20 4 C 15 4 4 30 G 4 15 4 20 6

AP-1 AP related

1 2 3 4 5 6 A 3 3 8 6 1 53 T 54 3 9 3 C 10 12 17 9 12 G 1 50 43 20 46 12

AP-2

1 2 3 4 5 A 28 24 T 29 C 32 31 3 G 1 4 8

alpha-CBF alpha-IRP

1 2 3 4 5 A 71 3 1 3 4 T 6 61 72 8 6 C 25 8 70 4 G 12 8 8 77

CBF-B

1 2 3 4 5 A 62 10 1 3 4 T 6 53 66 13 6 C 3 24 13 11 4 G 17 1 8 61 74

CCAAT-binding factor CAAT box

related

1 2 3 4 5 A 28 28 T 29 C 32 31 3 G 1 4 4

CDP H1TF2

1 2 3 4 5 A 1 28 24 T 29 C 32 32 3 3 G 2 3 7 8 3

CP1

1 2 3 4 5 A 17 28 24 17 T 29 C 32 31 17 17 3 G 18 4 8

CP2

1 2 3 4 5 A 70 10 5 3 7 T 10 61 74 13 6 C 3 24 13 15 5 G 17 5 8 69 82

CTF

1 2 3 4 5 A 10 16 126 13 T 6 21 29 13 9 C 122 104 10 106 123 G 40 37 13 46 46

CAC-binding factor

1 2 3 4 5 A 12 159 7 1 6 T 16 8 4 14 5 C 149 9 143 164 159 G 11 12 34 9 18

CACCC-binding factor CAC

related

1 2 3 4 5 A 3 155 14 1 8 T 19 18 5 C 160 16 151 167 169 G 16 4 19 5

gammaCAC2

NF related 1 2 3 4

A 50 11 50 386 T 197 27 18 22 C 106 427 383 32 G 129 17 31 42

NF-1

1 2 3 4 5 A 4 7 3 2 65 T 101 62 5 11 8 C 17 57 3 73 51 G 4 115 40 5

NF-1/L

1 2 3 4 5 A 10 6 68 30 3 T 107 85 22 88 41 C 13 44 10 89 G 12 7 52 14 9

NF-E

c-Myc related

1 2 3 4 5 6 A 20 9 T 15 4 5 31 37 4 C 25 6 2 9 3 6 G 10 24 30

c-Myc

F2F related 1 2 3 4 5 6 A 3 34 31 39 34 6 T 31 35 C 3 1 1 G 5 8 10 3 8

F2F

GATA-1 related

1 2 3 4 5 6 A 33 14 81 41 22 46 T 70 42 49 85 52 23 C 29 38 1 43 47 G 7 45 9 4 22 23

GATA-1

GR related 1 2 3 4 5 A 51 134 30 171 45 T 103 14 136 98 23 C 130 62 58 37 177 G 63 137 123 41 102

GR

H4TF-2 related

1 2 3 4 5 A 5 15 1 3 3 T 1 1 36 5 C 11 49 41 G 46 36 4 3

H4TF-2

HiNF-A related

1 2 3 4 5 6 7 A 56 21 52 34 50 18 T 4 3 3 40 C 21 3 18 10 G 5 36 6 3 8 3 33

HiNF-A

LF-A1 related

1 2 3 4 5 A 3 2 2 40 T 1 10 3 C 1 35 4 G 43 46 46 3 1

LF-A1

PEA3 related

1 2 3 4 5 6 A 27 6 3 24 30 31

T 1

C 6 5 7 G 6 33 31 15 9

PEA3

Pit-1 related

1 2 3 4 5 A 28 62 57 71 30 T 46 16 13 16 47 C 9 11 5 16 G 14 8 27 5 4

Pit-1

Sp1 related 1 2 3 4 5 A 33 4 14 132 33 T 125 19 7 4 58 C 102 110 189 134 173 G 34 161 84 24 30

Sp1

*: Data source from TRANSFAC database[35].

Figure 4.13 Sequence representation and position frequency of putative consensus regulatory elements in non-human primates

Position frequencies were counted from the putative binding site sequence in the conversed regions. Sequence logos were used for the regulatory elements sequence as a graphical view.

Appendix I Taxonomy of primates from NCBI

。Primates order

。Catarrhini

。Hominidae family

。Homo genus (人屬)

Homo sapiens species

。Pan (黑猩猩)

Pan troglodytes Pan paniscus

。Gorilla (大猩猩)

Gorilla gorilla

。Pongo (猩猩)

Pongo pygmaeus

。Hylobatidae

。Hylobates (長臂猿)

Hylobates muelleri Hylobates lai

Hylobates sp.

。Cercopithecidae(Old World monkeys)

。Cercopithecinaesubfamily

。Cercocebus (白瞼猴)

Cercocebus torquatus atys

。Cercopithecus (長尾猴)

Cercopithecus aethiops

。Erythrocebus (Patas Monkeys) Erythrocebus patas

。Macaca (獼猴)

Macaca arctoides Macaca fascicularis Macaca mulatta

Macaca nemestrina Macaca radiata

。Mandrillus (forest baboons)

Mandrillus leucophaeus

。Papio (狒狒)

Papio anubis

Papio cynocephalus Papio hamadryas Papio ursinus

。Colobinae subfamily

。Colobus (疣猴)

Colobus guereza

。Presbytis (葉猴)

Presbytis obscura

。Platyrrhini (New World monkeys)

。Callitrichidae family

。Callithrix (狨)

Callithrix penicillata Callithrix jacchus

。Cebidae

。Aotinae (夜猴)

Aotus trivirgatus

。Atelinae (蜘蛛猴)

Ateles geoffroyi

。Callicebinae

Callicebus moloch (Dusky titi )

。Cebinae

Cebus paella (捲尾猴) Saimiri sciureus (松鼠猴)

。Tarsii

。Tarsiidae

。Tarsius (眼鏡猴)

Tarsius bancanus

Appendix II Primate images from Primate Info Neta

Pan troglodytesb (Chimpanzee c ; 黑猩猩)

Pan paniscus (Bonobo ; 侏儒黑猩猩)

Gorilla gorilla (Gorilla ; 大猩猩)

Pongo pygmaeus (Orangutan ; 紅毛猩猩)

Hylobates muelleri (Mueller's gibbon ; 灰長臂猿)

Hylobates lai

(White-handed gibbon ; 白手長臂猿)

Hylobates sp.

(Gibbon ; 長臂猿)

Cercocebus torquatus atys (Sooty mangabey ; 白鬢白眉猴)

Cercopithecus aethiops (African green monkey ; 綠猴)

Erythrocebus patas (Red guenon ; 赤猴)

Macaca arctoides

(Stump-tailed macaque ; 截 尾 獼 猴 )

Macaca fascicularis

(Crab-eating macaque ; 食蟹獼猴)

在文檔中 中 華 大 學 (頁 42-96)

相關文件