• 沒有找到結果。

In this study, we have selected 21 genomes of Proteobacteria retrieved from the NCBI as the testing dataset, including R.

prowazekii (abbreviated as Rp, NC_000963), R. solanacearum (Rs, NC_003295), N. meningitidis MC58(NmM, NC_003112), N. meningitidis Z2491 (NmZ, NC_003112), E.coli K12 (EcK, NC_000913), E.coli O157:H7 EDL933 (EcO, NC_002655), S.

enterica subsp. enterica serovar Typhi Ty2 (Se, NC_003198), S.typhimuriu LT2 (St, NC_003197), Y. pestis KIM (Yp, NC_004088), B. floridanus (Bf, NC_005061), B. aphidicola str.

Bp (BaB, NC_004545), B. aphidicolastr. Sg (BaS, NC_004061), B. aphidicola str. APS (BaA, NC_002528), W. glossinidia brevipalpis (Wg, NC_004344), V. cholerae El Tor N1696 (I)(Vc, NC_002505), V. cholerae El Tor N1696 (II) (Vc, NC_002506), H. influenzae (Hi, NC_000907), P.aeruginosa (Pa, NC_002516), P. multocida (Pm, NC_002663), X. axonopodis (Xa, NC_003919), X.campestris (Xc, NC_003902) and X. fastidiosa (Xf, NC_002488). In addition, we used the phylogenetic trees constructed based on concatenated sequences for 60 homologous proteins [18] and 16S rRNAs as reference trees (Figures 5.1 and 5.2) and compared the genome trees obtained by our OGtree2.0 to those

phylogenetic trees predicted by our previous OGtree (Figure 5 . 3) [13] and BPhyOG (Figure 5.4) [6]. Basically, the phylogenetic tree in Figure 1 can be considered as a good reference tree, because it coincides with the taxonomy accepted by biologists for these Proteobacteria. Particularly, the three Buchnera species in this reference tree form a monophyletic group with the other insect endosymbionts of B. floridanus and W. glossinidia. In addition, this group of endosymbionts is a sister clade to the cluster of the other four enterobacteria of Yersinia, Esherichia, Shigella and Salmonella. However, the phylogenetic tree shown in Figure 5.2 is slightly differ from that in Figure 5.1 mainly with respect to the positions of the Xanthomonadales group (X. axonopodis, X.

campestris and X. fastidiosa ) and V. cholerae. In this reference tree of 16S rRNAs, the γ-Proteobacteria of X. axonopodis, X.

campestris and X. fastidiosa were placed in the β-Proteobacteria branch and the species of V. cholerae was placed a little away from P. aeruginosa.

Figure 5.1: Phylogenetic tree obtained from a trimmed alignment of 60 concatenated homologous proteins using maximum likelihood method, which was adapted from [18].

Figure 5.2: Phylogenetic tree obtained from 16s rRNAs using the neighbor joining method.

Figure 5.3: Genome tree obtained using OGtree with UPGMA method [13].

Figure 5.4: Phylogenetic tree constructed using BPhyOG [6, 7].

Inevitably, some misannotated genes may be included in the genomes of public databases. Therefore, we may exclude those CDSs annotated as being unknown, hypothetical or putative from each downloaded genome in our analysis, as was done in [6].

However, we found that most of the CDSs in W. brevipalpisa are currently annotated as unknown, hypothetical or putative, leading

us to find no orthologous OG pair between W. brevipalpisa and other Proteobacteria, if all these CDSs in W. brevipalpisa are removed from our analysis. Here, instead of this method, we first removed those genes currently annotated as horizontally transferred genes at the HGT-DB database [19] and then applied more stringent criteria of identifying putative orthologous genes by using BBH and setting the parameters with a minimum E-value of 10-8, at least 85% of each authentic CDS sequence involved in the alignment, and a minimum similarity of 45%. In addition, we observed that the amount of the orthologous OG pairs between non-γ-Proteobacteria genomes and other Proteobacteria genomes is few, resulting in difficulty measuring the accurate OG distances between them. Recall that the term ”gene” can be expanded to include both of its coding and regulatory regions, such as promoters and transcription terminators. In prokaryotic genomes, a promoter region, which basically contains the so-called −10 hexamer, extended −10 element, −35 hexamer and UP element, usually occupies about 60 base pairs (bp) upstream of the transcription start site (TSS) [20,21] and a terminator region usually occupies about 50 bp downstream of the stop codon [22]. In addition, as exemplified in E. coli genome, 95% of TSSs occur 325 bp upstream from the translation start sites (TLS) of their corresponding genes [23].

According to these information, therefore, we extended the region of each CDS by 385 bp at its 5’ end and by 50 bp at its 3’

end, so that any two adjacent genes in a genome were considered as an OG pair if their extended CDSs partially or completely overlap with each other. With default values for all the other parameters (e.g., the distance of OG order was measured using

rearrangements, instead of breakpoints, wc = 1 and wc = 1), we used OGtree2 to calculate the OG distance between every pair of Proteobacteria for constructing the genome trees for all the Proteobacteria used in this study with the UPGMA, NJ and FM methods.

Consequently, both the NJ and FM trees (see Figures 5.5 and 5.6, respectively) we obtained using OGtree2 have almost the topology was greatly congruent with that of the reference tree as shown in Figure 5.1. Particularly, the UPGMA tree clearly and correctly divided the 21 Proteobacteria into three monophyletic clades and it also reflected monophyly not only for the three Buchnera species but also for a wider group including the other insect endosymbionts of B. floridanus and W. glossinidia.

However, V. cholerae in the UPGMA tree was placed a little away from P. aeruginosa, which is the same as the reference tree of 16S rRNAs in Figure 2. Among the three tree-building methods in this experiment, the UPGMA method produced a genome tree that is much more congruent with the reference tree constructed using a trimmed alignment of 60 concatenated protein sequences, when compared to both the NJ and FM methods. This characteristic may be due to that, as reported in [8, 9], evolution of OGs occurs at a universal mutation rate across bacterial genomes.

Figure 5.5

Genome tree obtained using OGtree2.0 with NJ method.

Figure 5.6

Genome tree obtained using OGtree2.0 with FM method.

Figure 5.7

Genome tree obtained using OGtree2.0 with UPGMA method.

In the comparison of the phylogenetic tree inferred by BPhyOG (Figure 5.4), our genome tree produced using OGtree2 with the UPGMA method shows more precise and robust phylogenies for the 21 Proteobacteria genomes. In the BPhyOG tree, the relationship of endosymbionts was paraphyletic and particularly the two insect endosymbionts, W. brevipalpis and B.

aphidicola, were separated far away from each other. In addition, the three β-Proteobacteria were placed just as neighbor taxa rather than a sister cluster. In contrast, W. brevipalpis, B.

aphidicola and other three Buchnera species in our UPGMA tree (Figure 5.7), as well as in both reference trees (Figures 5.1 and 5.2), were placed as a sister group, suggesting that there should be a common origin for these five endosymbionts. Moreover, our current OGtree2.0 indeed outperformed over its previous version OGtree in phylogeny reconstruction for prokaryotes, because in the genome tree predicted by OGtree (Figure 5.3), the α-Proteobacteria of R. prowazekii and the β-Proteobacteria of R.

solanacearum were placed together as a sister group and the insect endosymbiont of B. floridanus was placed in the branch of enterobacteria.

Chapter 6

Conclusion

Previously, we have implemented a web server named OGtree2.0 to demonstrate that overlapping genes can be served as a useful genomic marker for reconstructing genome trees of some prokaryotes. In contrast to BPhyOG, the OG distance we defined to measure the difference between two prokaryotic genomes in our OGtree2.0 was based on a combination of their OG content and orthologous OG order.

In this study, we have improved the accuracy of our OGtree2.0 in reconstruction of prokaryotic genome trees by extending the regions of genes to include their regulatory regions and redefining the distance measure between two orthologous OG orders using genome rearrangements rather than breakpoints.

According to our experiments, the genome trees constructed by our OGtree2.0 are quite consistent with those reference trees that were reconstructed based on 16S rRNAs as well as concatenated sequences of 60 homologous proteins, compared with the phylogenetic trees produced by Luo et al. [6, 7] and OGtree2.0. Furthermore, among the tree-building methods in our experiments, the UPGMA method produced much more congruent genome trees compared to both the NJ

and FM methods, if they were based on the OG distance we defined in this study. This characteristic was also pointed out by Luo et al. in their studies [6, 7] only on the basis of the content of OG pairs. It has been reported that evolution of OGs occurs at a universal mutation rate across bacterial genomes [8, 9]. Perhaps due to this property, the UPGMA method is more suitable for the reconstruction of phylogenies particularly based on OG pairs, when compared to the NJ and FM methods. Our experimental results on a set of 21 Proteobacteria have shown that the above modifications indeed helped us to reconstruct a more precise and robust genome tree that coincides with the taxonomy accepted by biologists for these Proteobacteria. This suggests that our current OGtree2.0 can provide interesting insights into the study of evolutionary relationships of completely sequenced prokaryotic genomes.

References

[1] Snel B, Bork P, Huynen MA: Genome phylogeny based on gene content. Nature Genetics 1999, 21:108–110.

[2] Snel B, Huynen MA, Dutilh BE: Genome trees and the nature of genome evolution. Annual Review of Microbiology 2005,

59:191–209.

[3] Blanchette M, Kunisawa T, Sankoff D: Gene order breakpoint evidence in animal mitochondrial phylogeny. Journal of Molecular Evolution 1999, 49:193–203.

[4] Sankoff D: Genome rearrangement with gene families.

Bioinformatics 1999, 15:909–917.

[5] Belda E, Moya A, Silva FJ: Genome rearrangement distances and gene order phylogeny in γ-Proteobacteria. Molecular Biology and Evolution 2005, 22:1456–1467.

[6] Luo Y, Fu C, Zhang DY, Lin K: Overlapping genes as rare genomic markers: the phylogeny of γ-Proteobacteria as a case study. Trends in Genetics 2006, 22:593–596.

[7] Luo Y, Fu C, Zhang DY, Lin K: BPhyOG: an interactive server for genome-wide inference of bacterial phylogenies based on overlapping genes. BMC Bioinformatics 2007, 8:266.

[8] Fukuda Y, Nakayama Y, Tomita M: On dynamics of overlapping genes in bacterial genomes. Gene 2003, 323:181–187.

[9] Johnson ZI, Chisholm SW: Properties of overlapping genes are conserved across microbial genomes. Genome Research 2004, 14:2268–2272.

[10] Fukuda Y, Washio T, Tomita M: Comparative study of overlapping genes in the genomes of Mycoplasma genitalium and Mycoplasma pneumoniae. Nucleic Acids Research 1999, 27:1847–1853.

[11] Krakauer DC: Stability and evolution of overlapping genes.

Evolution: International Journal of Organic Evolution 2000, 54:731–739.

[12] Sakharkar KR, Sakharkar MK, Verma C, Chow VT: Comparative study of overlapping genes in bacteria, with special reference to Rickettsia prowazekii and Rickettsia conorii . International Journal of Systematic and Evolutionary Microbiology 2005, 55:1205–1209.

[13] Jiang LW, Lin KL, Lu CL: OGtree: a tool for creating genome trees

of prokaryotes based on overlapping genes. Nucleic Acids Research 2008, 36:W475–480.

[14] Snyder M, Gerstein M: Defining genes in the genomics era. Science 2003, 300(5617):258–560.

[15] Scherbakov DV, Garber MB: Overlapping genes in bacterial and phage genomes. Molecular Biology 2000, 34:485–495.

[16] Bourque G, Pevzner PA: Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Research 2002, 12:26–36.

[17] Yancopoulos S, Attie O, Friedberg R: Efficient sorting of genomic permutations by translocation, inversion and block interchange.

Bioinformatics 2005, 21:3340–3346.

[18] Koonin E.V. (2005): Orthologs, Paralogs, and Evolutionary Genomics. Annual review of genetics, 39, 309-338.

[19] Koonin E.V., Makarova, K.S. and Aravind, L. (2001): Horizontal gene transfer in prokaryotes: Quantification and Classification.

Annual review of microbiology, 55, 709-742.

[20] Remm, M., Storm, C.E. and Sonnhammer, E.L. (2001): Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. Journal of molecular biology, 314, 1041-1052.

[21] Comas I, Moya A, Gonzalez-Candelas F: From phylogenetics to phylogenomics: the evolutionary relationships of insect

endosymbiotic γ-Proteobacteria as a test case. Systematic biology 2007, 56:1–16.

[22] Garcia-Vallve S, Guzman E, Montero MA, Romeu A: HGT-DB: a database of putative horizontally transferred genes in prokaryotic complete genomes. Nucleic Acids Research 2003, 31:187–189.

[23] Browning DF, Busby SJW: The regulation of bacterial transcription initiation. Nature Reviews Microbiology 2004, 2:57–65.

[24] Janga SC, Collado-Vides J: Structure and evolution of gene regulatory networks in microbial genomes. Research in Microbiology 2007, 158:787–794.

[25] Unniraman S, Prakash R, Nagaraja V: Conserved economics of

transcription termination in eubacteria. Nucleic Acids Research 2002, 30:675–684.

[26] Burden S, Lin YX, Zhang R: Improving promoter prediction Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences. Bioinformatics 2005,

21:601–607.

[27] Rogozin IB, Spiridonov AN, Sorokin AV, Wolf YI, Jordan IK, Tatusov RL, Koonin EV: Purifying and directional selection in overlapping prokaryotic genes. Trends in Genetics 2002, 18:228–232.

[28] Tatusov RL, Koonin EV, Lipman DJ: A genomic perspective on protein families. Science 1997, 278:631–637.

[29] Hulsen T, Huynen MA, de Vlieg J, Groenen PM: Benchmarking ortholog identification methods using functional genomics data.

Genome Biology 2006, 7:4.

相關文件