• 沒有找到結果。

Assessing Determinants of Exonic Evolutionary Rates in Mammals

N/A
N/A
Protected

Academic year: 2021

Share "Assessing Determinants of Exonic Evolutionary Rates in Mammals"

Copied!
9
0
0

加載中.... (立即查看全文)

全文

(1)

in Mammals

Feng-Chi Chen,*

,1,2,3

Ben-Yang Liao,*

,1

Chia-Lin Pan,

1

Hsuan-Yu Lin,

1

and Andrew Ying-Fei Chang

1 1Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan, Republic of China

2Department of Life Science, National Chiao-Tung University, Hsinchu, Taiwan, Republic of China 3Department of Dentistry, China Medical University, Taichung, Taiwan, Republic of China *Corresponding author: E-mail: fcchen@nhri.org.tw; liaoby@nhri.org.tw.

Associate editor:John H. McDonald

Abstract

From studies investigating the differences in evolutionary rates between genes, gene compactness and gene expression level have been identified as important determinants of gene-level protein evolutionary rate, as represented by nonsynonymous to synonymous substitution rate (dN/dS) ratio. However, the causes of exon-level variances indN/dSare

less understood. Here, we use principal component regression to examine to what extent 13 exon features explain the variance indN,dS, and thedN/dSratio of human–rhesus macaque or human–mouse orthologous exons. The exon features

were grouped into six functional categories: expression features, mRNA splicing features, structural–functional features, compactness features, exon duplicability, and other features, including G þ C content and exon length. Although expression features are important for determiningdNanddN/dSbetween exons of different genes, structural–functional

features and splicing features explained more of the variance for exons of the same genes. Furthermore, we show that compactness features can explain only a relatively small percentage of variance in exon-leveldNordN/dSin either

between-gene or within-between-gene comparison. By contrast,dSyielded inconsistent results in the human–mouse comparison and the

human–rhesus macaque comparison. This inconsistency may suggest rapid evolutionary changes of the mutation landscape in mammals. Our results suggest that between-gene and within-gene variation indN/dS(anddN) are driven by

different evolutionary forces and that the role of mRNA splicing in causing the variation in evolutionary rates of coding sequences may be underappreciated.

Key words:exon evolution, alternative splicing, structural–functional constraint, gene expression, gene compactness.

Introduction

The evolutionary rates of different protein-coding genes in a genome can vary by several orders of magnitude (Li 1997). This variation has been extensively studied and is typically explained by differences in mutation rate and selection in-tensity among genes. In the past few years, data generated by whole-genome sequencing and functional genomic as-says have provided biologists an unprecedented opportu-nity to address this issue systematically. As a result, several biological factors associated with and potentially underly-ing evolutionary rate variations of protein-codunderly-ing genes have been identified. These factors include gene essentiality

(Hirsh and Fraser 2001; Jordan et al. 2002; Zhang and

He 2005; Liao et al. 2006), gene expression level (Pal

et al. 2001a;Akashi 2003;Subramanian and Kumar 2004;

Drummond et al. 2006), tissue specificity of gene expression

(Hastings 1996;Duret and Mouchiroud 2000;Subramanian

and Kumar 2004;Winter et al. 2004;Zhang and Li 2004),

presence of a duplicate paralog (Nembaware et al. 2002;

Castillo-Davis and Hartl 2003;Yang et al. 2003), properties

in the protein interaction network (Fraser et al. 2002;Fraser

2005;Hahn and Kern 2005;Kim et al. 2007), tendency to

form misinteracting protein complex (Yang et al. 2012), local recombination rate (Pal et al. 2001b), pleiotropy

(He and Zhang 2006), amino acid composition (Xia

et al. 2009), structural features of protein folding (Bloom

et al. 2006; Zhou et al. 2008; Franzosa and Xia 2009),

Gþ C content (Xia et al. 2009), gene compactness (Liao

et al. 2006), and subcellular localization (Liao et al. 2010).

All the abovementioned studies focused on sequence evolution of protein-coding genes as a whole. However, evolutionary rates also differ among regions of the same protein. For example, structurally ordered protein regions evolve more slowly than intrinsically disordered regions (IDRs) (Brown et al. 2002,2010;Chen et al. 2011), and pro-tein regions encoded by alternatively spliced exons (ASEs) evolve more rapidly than those encoded by constitutively spliced exons (CSEs) (Chen et al. 2006,2007,2011;Ramensky

et al. 2008). Thus, exon features have a profound effect

on within-gene variation in evolutionary rate. Because exon–intron structure is an important characteristic of multicellular eukaryotic genes that causes complexity (Sorek

et al. 2004;Xing and Lee 2007) and diversity (Xing and Lee

2005;Chen et al. 2006,2011;Chen and Chuang 2007;Keren

et al. 2010) of proteomes, a systematic analysis to delineate

the individual and integrative contributions of exon features to within-gene evolutionary rate variation is necessary to understand the molecular evolution of complex organisms.

© The Author 2012. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

Mol. Biol. Evol.

Research

article

3121 29(10):3121–3129. 2012 doi:10.1093/molbev/mss116 Advance Access publication April 12, 2012

at National Chiao Tung University Library on April 28, 2014

http://mbe.oxfordjournals.org/

(2)

To address this issue, we analyzed the effects of exon features (described below) on the variation of exonic evo-lutionary rates in mammals. We calculated the nonsynon-ymous substitution rate (dN), synonymous substitution

rate (dS), and thedN/dSratio for exons in human–mouse

and human–rhesus macaque one-to-one orthologous genes. To account for the intercorrelations between evolu-tionary rate determinants, principal component regression (PCR) was used to analyze the relative contributions of exon features on the variances ofdN,dS, anddN/dSratio.

PCR outperformed multivariate regression and partial cor-relation in delineating the cor-relationships among multiple intercorrelated factors when the data were noisy

(Drummond et al. 2006). In this study, 13 exon features that

may affect evolutionary rates were analyzed (table 1): weighted exon frequency (WEF, see Materials and Meth-ods), ASE/CSE exon type, exonic expression level, coeffi-cient of variation in exonic expression levels across multiple tissues, exonic expression breadth, percent of IDR, percent of annotated protein domain(s), proportion of amino acids predicted to be solvent accessible, the lengths of 5# and 3# flanking introns, exon duplicability (Materials and Methods), exon length, and Gþ C content. We demonstrate that the features related to the splicing and structural–functional constraints of exons are the most important in causing within-protein variation in evolutionary rates in mammals.

Materials and Methods

Source Data and Calculation of Evolutionary Rates

The human–mouse and human–rhesus macaque one-to-one orthologous genes and the corresponding transcript and peptide sequences were retrieved from Ensembl v59 through the BioMart interface (http://www.biomart.org)

(Guberman et al. 2011). To ensure data quality, we retained

only full-length transcripts (with start and stop codons) that have known protein products. To avoid unequal weighting between genes, we selected the longest tran-script from each human gene as the representative. We

then aligned the human peptide sequence against the orthologous mouse/macaque peptide sequences (i.e., pep-tides derived from one-to-one orthologous gene pairs) using MUSCLE (Edgar 2004). The longest alignable mouse/macaque peptide orthologue was retained. The peptide sequence alignments were then back translated to nucleotide sequences. The boundaries of ‘‘orthologous exons’’ were defined according to Ensembl human exon annotations. All gaps in the transcript alignments were dis-carded, so our approach did not consider lineage-specific gains/losses of exons. We calculated thedN,dS, anddN/dSof

each pair of orthologous exons using the codeml module of PAML 4 (Yang 2007). To ensure accurate estimates of evolutionary rates, only exons longer than 81 bp were included (Nekrutenko et al. 2002; Chen et al. 2011). For the human–mouse comparison, our final data set included 5,416 human–mouse orthologous gene pairs comprised of 21,706 orthologous exon pairs (table 2). For the human– rhesus macaque comparison, our final data set included 4,609 orthologous genes, which were comprised of 14,434 orthologous exon pairs (table 2). Compared with the number of human–mouse orthologous genes, there were fewer human–rhesus macaque orthologous genes in our analysis because the macaque draft genome had not been completely sequenced.

To control for differences in exon features of different genes, we calculated the differences in evolutionary rates (dN, dS, and dN/dS) and the differences in exon features

for all possible two-exon combinations of the same tran-script. Using PCR, we examined how much of the variance in exon-level dN, dS, or dN/dSwas explained by exon

fea-tures. Our data set for this within-gene analysis included 81,260 human–mouse exon pairs (combinations) and 37,508 human–rhesus macaque exon pairs (table 2).

Measuring Exon Features

ASE and CSE classification (Shabalina et al. 2010) and WEF calculation were done using in-house PERL scripts. WEF is defined as the length-weighted average of the frequency of an exon (supplementary fig. S1, Supplementary Material

online). Here, the frequency of an exon measures its relative importance and is calculated as the percent of transcript isoforms of a gene that include this specific exon. For ex-ample, CSEs have an exon frequency of 100% because they always occur in different isoforms. CSEs are considered to be indispensable for the biological functions of their tran-script/protein. We assume that an exon’s importance is reflected in how frequently it appears in different transcript

Table 1.Exon Features Included in PCR.

Exon Features

Splicing features WEF

ASE/CSE exon type Expression features Average expression level

Coefficient of variation of expression level

Expression breadth Structural–functional features Proportion of IDR

Proportion of Pfam domain Proportion of solvent accessible region

Compactness features Length of 5# flanking intron Length of 3# flanking intron Exon duplicability Exon duplicability

Other features Exon length G 1 C content

Table 2.The Human–Mouse and Human–Rhesus Macaque Othologous Genes and Exons for Within-Gene and Between-Gene Analyses. Human–Mouse Human–Rhesus Macaque Number of genes 5,416 4,609 Number of exons 21,706 14,434

Number of exon pairsa 81,260 37,508 a

The total number of within-gene exon combinations.

at National Chiao Tung University Library on April 28, 2014

http://mbe.oxfordjournals.org/

(3)

isoforms. In the case of partially overlapping exons ( supple-mentary fig. S1,Supplementary Materialonline), the exon boundaries may be ambiguous, and the frequencies of these exons are hard to define. For such cases, WEF gives a reasonable quantitative measure of the frequency that an exon is used in splicing events.

IDRs were predicted by using Disopred (Ward et al. 2004). Pfam protein domain information was retrieved from the Ensembl Database (http://www.ensembl.org), and the per-cent of annotated protein domain(s) of each exon was cal-culated. Solvent-accessible amino acid residues were predicted by using the ACCpro module of the SCRATCH package (Cheng et al. 2005) with a 30% exposure threshold. 5# and 3# intron length, exon length, and G þ C content of the exons were calculated using in-house PERL scripts on the genomic sequences downloaded from BioMart. The first and last coding exons of each transcript were excluded because they contain only 3# intron and 5# intron, respec-tively. Exon duplicability was evaluated by Blast-aligning each exon to the entire transcriptome. A potential exon duplicate was defined as a Blast hit that is ‡90% alignable and ‡90% identical to the query exon. The total number of Blast hits matching these criteria was defined as the duplicability of an exon.

The expression features of the exons were derived from two published human RNA-seq data sets (GSE12946 and GSE13652) (Pan et al. 2008; Wang et al. 2008) archived in Gene Expression Omnibus (http://www.ncbi.nlm.nih. gov/geo/). These data sets cover 11 human tissues (adipose, brain, breast, cerebral cortex, colon, heart, liver, lung, lymph node, skeletal muscle, and testes). The 32-mer RNA-seq sequences were mapped to the human genome (hg 19) using SeqMap (Jiang and Wong 2008). Similar to a previ-ously described approach (Sultan et al. 2008;Qian et al. 2010), the exon-specific transcriptional abundance was de-fined as the total number of RNA-seq reads uniquely mapped onto the exon divided by the number of unique n-mers per exon, where n 5 32. The exon-specific tran-scriptional abundances were averaged over all of the ana-lyzed tissues to represent the expression level of an exon. To measure the expression breadths of exons, the exon-specific transcriptional abundances were sorted for each tissue, and the top 50% of the exons were defined as ex-pressed in a certain tissue. The expression breadth of an exon was then calculated as the proportion of tissues in which this exon was expressed (transcriptional abundance . 0). The coefficient of variation was calculated by dividing the standard deviation of exon-specific transcriptional abundances by the mean of exon-specific transcriptional abundances across the 11 tissues for each exon. The PCR analyses were conducted in R (http://www.r-project. org/) using modified scripts fromDrummond et al. (2006).

Results

Characteristics of Exons

To evaluate the determinants of exon-level evolutionary rates, we had to control for gene-level differences. For

example, expression level may differ by a much larger ex-tent between genes than between exons of the same genes. Therefore, the results from PCR analyses based on pooled exon comparisons may to some extent reflect the gene-level differences. To address this issue, we calculated the between-exon differences in human–mouse evolutionary rates (dN/dS,dN, anddS) and the 13 exon features for exons

of the same transcript. We performed PCR analyses sepa-rately fordN/dS,dN, and dSagainst the exon features.

The composition of principal components classified the 13 exon features into six biologically meaningful categories (table 1): 1) mRNA splicing features: WEF and ASE/CSE exon type (ASE 5 0; CSE 5 1); 2) exon-level RNA expres-sion features: expresexpres-sion level, coefficient of variation in expression level, and expression breadth; 3) structural– functional features: percent of IDR, percent of Pfam protein domain(s), and proportion of amino acid residues pre-dicted to be solvent accessible; 4) gene compactness features: the lengths of 5# and 3# flanking introns; 5) exon duplicability; and 6) other features: exon length and Gþ C content (supplementary table S1,Supplementary Material

online).

Structural–Functional Features and Splicing

Features Are the Two Most Important

Determinants of Within-Gene

d

N

/d

S

Variations

As shown infigure 1, the primary principal component for exonicdN/dSis composed mainly of variation in structural–

functional features (component 3 in the left panel in fig. 1A), and the secondary component (component 2) con-sists mainly of variation in splicing features. Although variation in expression features (component 1) is a well-known determinant of dN/dS at the protein level (Pal

et al. 2006), it ranked sixth in explaining exon-level dN/

dSvariation. A similar trend was observed fordN(fig. 1B,

left panel). Meanwhile, other features dominated the third and the fourth most important components for exonicdN/

dS. FordS, the two most important components were

com-posed mainly of splicing features and structural–functional features (fig. 1C, left panel).

We then calculated the total contribution of each fea-ture category to the variance in dN/dS. Some feature

cat-egories dominated multiple components, so we summed the percentages of variance explained by components that were dominated by the same feature category. A compo-nent was dominated by a feature category if the feature category accounted for more than 50% of this component. If none of the feature categories exceeded the 50% thresh-old, the component was designated as a ‘‘mixed’’ compo-nent. For dN/dS, structural–functional features, splicing

features, and expression features accounted for approxi-mately 3.4%, 2.8%, and 0.5% of the variance explained, re-spectively (fig. 1A, right panel). For dN, the three feature

categories accounted for 3.5%, 2.6%, and 0.5% of the var-iance explained, respectively (fig. 1B, right panel). For dS,

the three feature categories accounted for 1.7%, 1.5%, and 0.2% of the variance explained, respectively. Unexpect-edly, other features accounted for a considerable percent

at National Chiao Tung University Library on April 28, 2014

http://mbe.oxfordjournals.org/

(4)

of variance explained fordN/dS(;2.2%),dN(;2.3%), and

dS (0.6%) (fig. 1A–C, right panel). Although compactness

features were previously suggested to be a dominant factor affecting gene-leveldN/dS(Liao et al. 2006), they accounted

for a relatively small percent (,0.1%) of exonicdN/dS,dN,

anddS. Similarly, although gene duplication has a strong

effect on evolutionary rates, exon duplicability explained ,0.1% of the variance of exon-level evolutionary rates (fig. 1, right panel). (For detailed human–mouse PCR re-sults broken down by each exon feature, seesupplementary tables S2–S4,Supplementary Materialonline.)

To evaluate these results across smaller genetic distances, we repeated the analyses for the human–rhesus macaque comparison. We obtained similar results for dN/dS and

dN, with structural-functional features, splicing features,

and expression features accounting for ;1.5%, 0.7%, and 0.3% of dN/dSvariance, respectively (fig. 2A). For dN, these

features accounted for 1.4%, 0.7%, and 0.2%, respectively (fig. 2B). Notably, the percent of variance in evolutionary rates explained by the exon features were generally larger in the human–mouse comparison than in the human–rhesus ma-caque comparison, possibility due to the relatively low se-quence quality of the rhesus macaque genome and the small genetic distance between human and rhesus macaque. The principal components for variation indSwere different

between the two data sets. In the human–rhesus macaque comparison, the importance of splicing and structural– functional features was significantly reduced, whereas the im-portance of other features increased (fig. 2C). Therefore, the primary determinant of exon-level dSin mammals remains

A

B

C

FIG.1.The principal components that affect human–mouse within-gene exonic evolutionary rate variations (left panel) and the percent of variance explained by each category of exon features (right panel) for (A) dN/dSratio, (B) dN, and (C) dS. Note that only the six most important components are shown.

at National Chiao Tung University Library on April 28, 2014

http://mbe.oxfordjournals.org/

(5)

inconclusive. The detailed human–rhesus macaque PCR results (broken down to 13 exon features) are given in sup-plementary tables S5–S7(Supplementary Materialonline).

The Effects of Gene-Level Characteristics on the

Evolutionary Rates of Exons

Unlike previous findings based on analyses of full-length mammalian proteins (Liao et al. 2006;Drummond and Wilke 2008), expression features and compactness features ac-counted for only a small percent of variance in exon-level dN/dSanddN. This is possibly due to that these two feature

categories, especially expression features, differed to a greater extent between genes than between exons of the same genes. Therefore, the effects of these two features were less significant in the intragenic analyses. To examine this pos-sibility, we randomly selected 81,260 human–mouse and

37,508 human–rhesus macaque exon pairs from different genes without replacement and conducted a between-gene PCR analysis. We then summed the contributions of each of the feature categories as described above. We repeated this analysis on randomly selected exon sets for 500 times and generated boxplots of percent variance explained for each feature category (figs. 3and4). In the human–mouse com-parison (fig. 3), expression features are more important in affecting the variances indN/dSanddNthan splicing features

and structural–functional features (fig. 3A and B). For the variance in exon-level dS, splicing features were the most

important, followed by structural–functional features and expression features (fig. 3C).

For the human–rhesus macaque comparison, the results were similar fordN/dSanddN(fig. 4A and B). FordS, splicing

features remained relatively important. However, the

A

B

C

FIG.2.The principal components that affect human–macaque within-gene exonic evolutionary rate variations (left panel) and the percent of variance explained by each category of exon features (right panel) for (A) dN/dSratio, (B) dN, and (C) dS. Note that only the six most important components are shown.

at National Chiao Tung University Library on April 28, 2014

http://mbe.oxfordjournals.org/

(6)

contributions of structural–functional features and mixed features varied to a large extent. This is because in many cases, structural–functional features accounted for either slightly less or slightly more than 50% of a component. In the former case, the component was designated as dom-inated by structural–functional features, whereas in the latter case, it was considered a mixed component. These variations in component designation caused the large variations in

figure 4C.

Discussion

In this study, we analyzed the contributions of 13 exon fea-tures (table 1) to exonic evolutionary rates using both within- and between-gene comparisons. The 13 features of exons were classified into six major components based on the principal component analysis: splicing features, ex-pression features, structural–functional features, gene

compactness features, duplicability, and other features composed of G þ C content and exon length. Although other features contributed to dN,dS, anddN/dS at an

ap-preciable level (figs. 1–4), a biological interpretation of this component is currently lacking and requires further explo-ration. We cannot exclude the possibility that our data sets contain noises that cannot be easily eliminated using PCR analyses. Alternatively, some important exon features might not have been included, leaving a considerable pro-portion of variances in evolutionary rates unexplained.

The within-gene analyses (figs. 1 and2) controlled for between-gene differences in exon features and indicated that structural–functional features and splicing features are the two most important determinants of exon-level dN/dS and dN. By contrast, between-gene analyses (figs. 3

and4) indicated that expression features have larger effects on exon-level dN/dS and dN variance than structural–

functional and splicing features. Taken together, our results

2.5

A

B

C

FIG.3.The distributions of percent variance in human–mouse evolutionary rates explained by different categories of exon features as generated by 500 random sets of exon pairs from different genes: (A) dN/dSratio, (B) dN, and (C) dS. Upper quartile, median, and lower quartile values are indicated in each box. Bars outside the box indicate 1.5-fold interquartile range from the upper and lower quartile.

at National Chiao Tung University Library on April 28, 2014

http://mbe.oxfordjournals.org/

(7)

suggest that the differences in gene-level biological features (especially expression features) may set the coarse back-ground of protein evolution at the gene level, upon which exon features fine-tune the within-gene variations in protein evolutionary rates. Because between-gene variation in ex-pression features has strong effects ondN/dSanddN,

con-trolling for the expression features revealed the significant effects of exon-level features, such as splicing features and structural–functional features.

Gene compactness was identified as more important than expression level in affectingdN/dSat the gene level

(Liao et al. 2006). However, at the exon level, gene

compact-ness only has minor contributions todN/dS(figs. 1–4). By

contrast, expression features were an important contribu-tor to exonicdN/dSvariations in the between-gene analysis

(figs. 3and4). The increased importance of gene expression and decreased importance of gene compactness on exon-leveldN/dSvariance may reflect differences in the source

data between the gene-level and exon-level analyses. Notably, the gene-level study incorporated microarray expression data, whereas the present exon-level study in-corporated RNA-seq expression data (Pan et al. 2008). For the microarray data, probes were not located within all exons of a gene. As a result, microarrays do not precisely measure mRNA abundance, especially for alternatively spliced genes. Furthermore, the expression signals mea-sured by hybridization methods are affected by probe-target affinity, which can vary for probes within the same transcript (Liao and Zhang 2006). Therefore, sequencing-based methods, such as RNA-seq, may have better accuracy and resolution than array-based methods in measuring exon-level expression properties. Another potential reason for the decreased effect of gene compactness on dN/dS

might be the result of including splicing features (which have been overlooked in previous studies) in the present analyses.

A

B

C

FIG. 4.The distributions of percent variance in human–macaque evolutionary rates explained by different categories of exon features as generated by 500 random sets of exon pairs from different genes: (A) dN/dSratio, (B) dN, and (C) dS. Upper quartile, median, and lower quartile values are indicated in each box. Bars outside the box indicate 1.5-fold interquartile range from the upper and lower quartile.

at National Chiao Tung University Library on April 28, 2014

http://mbe.oxfordjournals.org/

(8)

It is unexpected that splicing features and structural– functional features are more important than expression features and compactness features in affecting within-gene exon-leveldN/dSdifferences (figs. 1and2). Many unicellular

organisms, such as yeast, contain few introns and rarely implement alternative splicing. By contrast, complex multi-cellular organisms implement alternative splicing as a mechanism for gene regulation. Thus, lineage-specific properties, such as splicing features, can be an important determinant in affecting within-gene variation in protein evolutionary rate.

In a previous PCR study that examined gene-level evo-lutionary rates in yeast, the percent ofdN/dSvariation

ac-counted for by the most influential factors (expression level, codon bias, and protein abundance) were as large as ;40% (Drummond et al. 2006). By contrast, the contri-butions to exon-level evolutionary rate variation of the most influential categories are smaller (;3%,fig. 1). There are several possible reasons for this difference in explainable variance. First, the significantly reduced effective popula-tion sizes in multicellular organisms have led to a decrease in the efficiency of selection, thereby weakening the corre-lation between dN/dS and the examined exon features.

Second, in multicellular organisms, tissue differentiation results in genes that are expressed in multiple tissues and subject to complex regulation and selection pressures. In mammals, natural selection may have targeted not only individual biological factors, such as exon features, but also factors associated with spatial–temporal interactions

(Gu and Su 2007). Thus, the relatively small percent of

ex-plainabledN/dSvariance may reflect our limited knowledge

of the targets of selection in complex organisms. Consistent with this notion, previous studies showed that the percent of variance indN/dSexplained by a single biological factor

is smaller in mammals than in yeast (Liao et al. 2006,2010). Third, although we filtered out short exons (Materials and Methods), the average length of the analyzed exons is shorter than the lengths of genes. Therefore, the estimates of exonic evolutionary rates and other exon features may be less accurate and subject to large variation. In other words, exon-level data may be noisier than gene-level data. In addition, the within-gene differences in biological features and evolutionary rates can be fairly small. The signal-to-noise ratio in the exon-level analysis is thus limited, which may have reduced the explaining power of the exon fea-tures. Regardless of the amount of variance explained, the human–mouse and human–rhesus macaque compar-isons yielded consistent results for how exon features explain variation indN/dSand dN.

By analyzing the effects of 13 exon features on exon-level evolutionary rate, we demonstrate the predominant roles of splicing features and structural–functional features in determiningdN/dS anddN of mammalian exons. Our

re-sults clearly demonstrate that gene-level and exon-level variations indN/dSanddNare affected by different

biolog-ical properties of DNA. Our findings thus may shed new light on the sources of evolutionary rate variations within mammalian genes.

Supplementary Material

Supplementary figure S1andtables S1–S7are available at Molecular Biology and Evolution online (http:// www.mbe.oxfordjournals.org/).

Acknowledgments

This study was supported by the intramural funding of National Health Research Institutes, Taiwan, to both F.-C.C. and B.-Y.L. and research grant (NSC 99-2311-B-400-003-MY2) from the National Science Council, Taiwan, to B.-Y.L. We thank Dr Pei-Shen Lin for statistical advice. We also thank two anonymous reviewers for constructive comments.

References

Akashi H. 2003. Translational selection and yeast proteome evolution.Genetics 164:1291–1303.

Bloom JD, Drummond DA, Arnold FH, Wilke CO. 2006. Structural determinants of the rate of protein evolution in yeast.Mol Biol Evol. 23:1751–1761.

Brown CJ, Johnson AK, Daughdrill GW. 2010. Comparing models of evolution for ordered and disordered proteins.Mol Biol Evol. 27:609–621.

Brown CJ, Takayama S, Campen AM, Vise P, Marshall TW, Oldfield CJ, Williams CJ, Dunker AK. 2002. Evolutionary rate heterogeneity in proteins with long disordered regions.J Mol Evol. 55:104–110. Castillo-Davis CI, Hartl DL. 2003. Conservation, relocation and

duplication in genome evolution.Trends Genet. 19:593–597. Chen FC, Chaw SM, Tzeng YH, Wang SS, Chuang TJ. 2007. Opposite

evolutionary effects between different alternative splicing patterns.Mol Biol Evol. 24:1443–1446.

Chen FC, Chuang TJ. 2007. Different alternative splicing patterns are subject to opposite selection pressure for protein reading frame preservation.BMC Evol Biol. 7:179.

Chen FC, Pan CL, Lin HY. 2011. Independent effects of alternative splicing and structural constraint on the evolution of mammalian coding exons.Mol Biol Evol. 29:187–193.

Chen FC, Wang SS, Chen CJ, Li WH, Chuang TJ. 2006. Alternatively and constitutively spliced exons are subject to different evolutionary forces.Mol Biol Evol. 23:675–682.

Cheng J, Randall AZ, Sweredoski MJ, Baldi P. 2005. SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res. 33:W72–W76.

Drummond DA, Raval A, Wilke CO. 2006. A single determinant dominates the rate of yeast protein evolution.Mol Biol Evol. 23: 327–337.

Drummond DA, Wilke CO. 2008. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution.Cell 134:341–352.

Duret L, Mouchiroud D. 2000. Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate.Mol Biol Evol. 17:68–74.

Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput.Nucleic Acids Res. 32:1792–1797. Franzosa EA, Xia Y. 2009. Structural determinants of protein evolution are context-sensitive at the residue level.Mol Biol Evol. 26:2387–2395.

Fraser HB. 2005. Modularity and evolutionary constraint on proteins.Nat Genet. 37:351–352.

Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW. 2002. Evolutionary rate in the protein interaction network. Science 296:750–752.

at National Chiao Tung University Library on April 28, 2014

http://mbe.oxfordjournals.org/

(9)

Gu X, Su Z. 2007. Tissue-driven hypothesis of genomic evolution and sequence-expression correlations.Proc Natl Acad Sci U S A. 104:2779–2784.

Guberman JM, Ai J, Arnaiz O, et al. (60 co-authors). 2011. BioMart Central Portal: an open database network for the biological community.Database (Oxford) 2011:bar041.

Hahn MW, Kern AD. 2005. Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol. 22:803–806.

Hastings KE. 1996. Strong evolutionary conservation of broadly expressed protein isoforms in the troponin I gene family and other vertebrate gene families.J Mol Evol. 42:631–640. He X, Zhang J. 2006. Toward a molecular understanding of

pleiotropy.Genetics 173:1885–1891.

Hirsh AE, Fraser HB. 2001. Protein dispensability and rate of evolution.Nature 411:1046–1049.

Jiang H, Wong WH. 2008. SeqMap: mapping massive amount of oligonucleotides to the genome.Bioinformatics 24:2395–2396. Jordan IK, Rogozin IB, Wolf YI, Koonin EV. 2002. Essential genes

are more evolutionarily conserved than are nonessential genes in bacteria.Genome Res. 12:962–968.

Keren H, Lev-Maor G, Ast G. 2010. Alternative splicing and evolution: diversification, exon definition and function.Nat Rev Genet. 11:345–355.

Kim PM, Korbel JO, Gerstein MB. 2007. Positive selection at the protein network periphery: evaluation in terms of structural constraints and cellular context.Proc Natl Acad Sci U S A. 104: 20274–20279.

Li W-H. 1997. Molecular evolution. Sunderland (MA): Sinauer Associates.

Liao BY, Scott NM, Zhang J. 2006. Impacts of gene essentiality, expression pattern, and gene compactness on the evolutionary rate of mammalian proteins.Mol Biol Evol. 23:2072–2080. Liao BY, Weng MP, Zhang J. 2010. Impact of extracellularity on the

evolutionary rate of mammalian proteins.Genome Biol Evol. 2: 39–43.

Liao BY, Zhang J. 2006. Low rates of expression profile divergence in highly expressed genes and tissue-specific genes during mammalian evolution.Mol Biol Evol. 23:1119–1128.

Nekrutenko A, Makova KD, Li WH. 2002. The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study.Genome Res. 12:198–202. Nembaware V, Crum K, Kelso J, Seoighe C. 2002. Impact of the

presence of paralogs on sequence divergence in a set of mouse-human orthologs.Genome Res. 12:1370–1376.

Pal C, Papp B, Hurst LD. 2001a. Highly expressed genes in yeast evolve slowly.Genetics 158:927–931.

Pal C, Papp B, Hurst LD. 2001b. Does the recombination rate affect the efficiency of purifying selection? The yeast genome provides a partial answer.Mol Biol Evol. 18:2323–2326.

Pal C, Papp B, Lercher MJ. 2006. An integrated view of protein evolution.Nat Rev Genet. 7:337–348.

Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. 2008. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing.Nat Genet. 40:1413–1415. Qian W, Liao BY, Chang AY, Zhang J. 2010. Maintenance of

duplicate genes and their functional redundancy by reduced expression.Trends Genet. 26:425–430.

Ramensky VE, Nurtdinov RN, Neverov AD, Mironov AA, Gelfand MS. 2008. Positive selection in alternatively spliced exons of human genes.Am J Hum Genet. 83:94–98.

Shabalina SA, Spiridonov AN, Spiridonov NA, Koonin EV. 2010. Connections between alternative transcription and alternative splicing in mammals.Genome Biol Evol. 2:791–799.

Sorek R, Shamir R, Ast G. 2004. How prevalent is functional alternative splicing in the human genome?Trends Genet. 20:68–71. Subramanian S, Kumar S. 2004. Gene expression intensity shapes

evolutionary rates of the proteins encoded by the vertebrate genome.Genetics 168:373–381.

Sultan M, Schulz MH, Richard H, et al. (16 co-authors). 2008. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome.Science 321:956–960. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C,

Kingsmore SF, Schroth GP, Burge CB. 2008. Alternative isoform regulation in human tissue transcriptomes.Nature 456:470–476. Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT. 2004. The DISOPRED server for the prediction of protein disorder. Bioinformatics 20:2138–2139.

Winter EE, Goodstadt L, Ponting CP. 2004. Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. Genome Res. 14:54–61.

Xia Y, Franzosa EA, Gerstein MB. 2009. Integrated assessment of genomic correlates of protein evolutionary rate. PLoS Comput Biol. 5:e1000413.

Xing Y, Lee C. 2005. Evidence of functional selection pressure for alternative splicing events that accelerate evolution of protein subsequences.Proc Natl Acad Sci U S A. 102:13526–13531. Xing Y, Lee C. 2007. Relating alternative splicing to proteome

complexity and genome evolution.Adv Exp Med Biol. 623:36–49. Yang J, Gu Z, Li WH. 2003. Rate of protein evolution versus fitness

effect of gene deletion.Mol Biol Evol. 20:772–774.

Yang JR, Liao BY, Zhuang SM, Zhang J. 2012. Protein misinteraction avoidance causes highly expressed proteins to evolve slowly. Proc Natl Acad Sci U S A. 109:E831–E840.

Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood.Mol Biol Evol. 24:1586–1591.

Zhang J, He X. 2005. Significant impact of protein dispensability on the instantaneous rate of protein evolution. Mol Biol Evol. 22:1147–1155.

Zhang L, Li WH. 2004. Mammalian housekeeping genes evolve more slowly than tissue-specific genes.Mol Biol Evol. 21:236–239. Zhou T, Drummond DA, Wilke CO. 2008. Contact density affects

protein evolutionary rate from bacteria to animals. J Mol Evol. 66:395–404.

at National Chiao Tung University Library on April 28, 2014

http://mbe.oxfordjournals.org/

數據

Table 2. The Human–Mouse and Human–Rhesus Macaque Othologous Genes and Exons for Within-Gene and Between-Gene Analyses

參考文獻

相關文件

4G - Index and principal rates of change of the Composite Consumer Price Index at section, class and group levels of goods and services. 4A - Index and principal rates of change

4G - Index and principal rates of change of the Composite Consumer Price Index at section, class and group levels of goods and services. 4A - Index and principal rates of change

4G - Index and principal rates of change of the Composite Consumer Price Index at section, class and group levels of goods and services. 4A - Index and principal rates of change

4G - Index and principal rates of change of the Composite Consumer Price Index at section, class and group levels of goods and services. 4A - Index and principal rates of change

4G - Index and principal rates of change of the Composite Consumer Price Index at section, class and group levels of goods and services. 4A - Index and principal rates of change

4G - Index and principal rates of change of the Composite Consumer Price Index at section, class and group levels of goods and services. 4A - Index and principal rates of change

6G - Index and rates of change of the Composite CPI at section, class, group and principal subgroups levels of goods and services. 6A - Index and rates of change of CPI-A at

4G - Index and principal rates of change of the Composite Consumer Price Index at section, class and group levels of goods and services. 4A - Index and principal rates of change