• 沒有找到結果。

Web-based System of RegPhos

Chapter 4 Discovery of Protein Kinase-Substrate Phosphorylation Networks . 101

4.8 Web-based System of RegPhos

To facilitate the investigation of protein kinase and their substrate, a web-based system was implemented for users to efficiently browse the protein kinase and their substrate proteins in a user-friendly manner. Three major functions, including browsing kinase or substrate, constructing phosphorylation network, and microarray expression analysis, are provided in the proposed system. The box of “quick search” can let users input their interested kinase name or substrate name, as shown in Figure 4.21, users can investigate into the protein description, subcellular localization, functional domain, tertiary structure, and phosphorylation sites with catalytic kinase of CEBPB. All the experimentally verified kinase-specific phosphorylation sites and RegPhos-identified kinase-specific phosphorylation sites are provided to users. The JMol viewer is adapted for the visualization of PDB structure.

Figure 4.21 Graphical visualization of substrate protein with catalytic kinases.

To investigate the expression correlation of kinase and substrate, the human gene expression samples of Affymetrix GeneChip Human Genome U133 Array Set HG-U133A

platform (GPL96), consisting of 22283 probe set for 12678 genes, are used to explore the co-coexpression analysis of kinase and substrate genes. However, the first problem we faced is what kind of microarray experiment should be selected for investigating the co-expression of kinase and substrate genes. Without any specific interest and limitation, we decide to focus on the experimental series of microarray with the raw data. Totally 2714 samples within 98 experiment series (GSE) are provided in the web-based system. The Pearson correlation coefficient of gene expression pattern between kinase and substrate are calculated in all 98 experiment series. As shown in Figure 4.22, the expression correlation of kinase CDC2 and substrate p53 in 98 experiment series are provided, and users can investigate into the expression pattern of CDC2 and p53 genes in detail.

Figure 4.22 The expression profile of kinase and substrate genes.

The proposed system can let users input a group of protein names to be constructed the phosphorylation network associated with the information protein subcellular localization. To fully investigate how protein kinase control the intracellular processes, the experimentally verified kinase-specific phosphorylation sites and the discovered kinase-substrate interactions identified by RegPhos are incorporated to construct the phosphorylation networks starting

from receptor kinases associated with membrane to transcription factors located in nucleus.

However, the phosphorylation-driven signal transduction pathway is not always the phosphorylation cascade. Some protein-protein interactions are involved in the signal transduction pathway, such as IRS1-GRB2 interaction, GRB2-SOS1 interaction, SOS1-HRAS interaction, and HRAS-RAF1 interaction in insulin signaling pathway. Figure 4.23 shows an example of insulin signaling network in the construction of phosphorylation network. A group of proteins associated with insulin signaling pathway are inputted to construct the network from membrane-associated proteins to nuclear proteins.

Figure 4.23 Example of insulin signaling network in the construction of phosphorylation network.

4.9 Summary

The desire of mapping phosphorylation networks has motivated the development of computational methods to investigate the substrate specificity of kinase-specific phosphorylation sites, based on experimental identification of the consensus sequence motifs recognized by the active site of kinase catalytic domains. However, only 20% experimental phosphorylation sites have the annotation of catalytic kinases, covering 350 kinases (67%).

The presented method is designed to link experimentally validated phosphorylation sites to protein kinases. Due to the fact that signaling proteins are modular in the sense that they contain domains (catalytic or interaction) and linear motifs (phosphorylation or binding sites), which mediate interactions between proteins [92], the protein-protein interaction and protein association are incorporated. It also exploits both the inherent propensity of kinase catalytic domains to phosphorylated particular sequence motifs and contextual information regarding the physical interaction, functional association, cellular co-localization and coexpression of kinases and substrates.

Investigating into the predictive power of the context of protein associations, physical protein interactions play the dominant role among the primary experimental data, whereas gene coexpression contributes only very little. Physical protein interactions were imported and merged from numerous repositories, and the reliability of each individual interaction was assessed based on the promiscuity of the interaction partners. Gene coexpression was measured by calculating the Pearson correlation coefficient between two genes across 98 human gene expression experiment series of Affymetrix GeneChip Human Genome U133 Array Set HG-U133A platform (GPL96) collected from Gene Expression Omnibus repository.

After the evaluation, the improved predictive power gained from using context of protein association underlines the importance of kinase-substrate interactions in the specificity of protein phosphorylation within cells. The predictive specificity of kinase groups with similar consensus motifs can be improved by the consideration of protein association. We would also suggest that this underlines the utility of protein association data in modeling cellular processes. The identified kinase-substrate interactions were adopted to fully construct the intracellular phosphorylation networks. Furthermore, GEO microarray expression data were used to validate whether the kinase and substrate genes in the constructed phosphorylation networks have syn-expression pattern.

Chapter 5 Discussions 5.1 Characteristics

To fully investigate how protein kinases regulate the intracellular processes, the comprehensive and accurate identification of the kinase-specific substrates is necessary.

Therefore, we propose a method, RegPhos, incorporates computational model with protein associations (protein-protein interactions, functional associations, and subcellular localization) for identifying the catalytic kinase for each phosphoprotein with experimental phosphorylated sites. To observe the expressed relationship between kinase and substrate, the gene expression microarray data is adopted to observe the expression of kinase and substrate genes in specific conditions, for instance, the normal tissue and cancerous tissue.

With the increasing number of in vivo phosphorylation sites have been identified, the desire of map the network of protein kinase and substrate has motivated. The experimental kinase-specific substrates, ultimately, need to be combined by systems biology analysis, which translates the separate, large-scale datasets into signaling networks. Therefore, we incorporated the experimentally verified kinase-substrate interactions with computationally identified kinase-substrate interactions to construct the intracellular phosphorylation network starting from receptor kinases to transcription factors, associated with the formation of protein subcellular localization. Moreover, the experimental expression evidence, such as gene microarray data, was adopted to validate the syn-expression of the constructed kinase-substrate phosphorylation network with statistical significance.

Comparison between RegPhos and NetworKIN

Rune Linding and the authors have developed an approach, NetworKIN [112], that augments motif-based predictions with the network context of kinases and phosphoproteins. As given in Table 5.1, the comparison between RegPhos and NetworKIN are listed. NetworKIN collected the experimental phosphorylation data from Phospho.ELM and adopted NetPhosK and Scansite to the phosphorylation site prediction on 20 kinase families encompassing 112 individual kinases. The protein association database STRING, which integrates information from curated pathway databases, co-occurrence in abstracts, physical protein interaction assays, mRNA expression studies, and genomic context, is used to investigate the direct and

indirect interactions between kinase and substrate. NetworKIN pinpoints kinases responsible for specific phosphorylation and yields a 2.5-fold improvement in the accuracy with which phosphorylation networks can be constructed. T

Table 5.1 Comparison between RegPhos and NetworKIN.

Method NetworKIN RegPhos

Species Human Human

Phosphorylation resource Phospho.ELM

Phospho.ELM (7.0),

UniProtKB/SwissProt (55.0), HPRD (7.0) and PHOSIDA (1.0) Number of kinase

families

20 kinase families encompassing

112 individual kinases 101 kinase families covering 300 kinases Kinase-specific

phosphorylation site prediction

1.NetPhosK (neural network) 2.Scansite (position-specific matrix)

KinasePhos ( SVM model trained with sequence and structural features) Blast (for individual kinase whose substrate site are less than 10)

Protein association

context Protein functional association database STRING

1.Protein-protein interaction (DIP, MINT, IntAct, and HPRD)

2.Functional association (STRING) 3.Cellular localization (LOCATE, PSORTdb, OrganelleDB, UniProtKB, and GOA)

Method

Two-staged prediction:

1. Kinase-specific phosphorylation site prediction

2.Protein association context

Logistic regression of

1.Kinase-specific phosphorylation site prediction score

2.Interacting depth of Protein-protein interaction

3.Confidence score of functional association

4.Cellular localization

Gene expression analysis - 98 experiment series of Affymetrix HG-U133A platform (GPL96)

Predictive performance

52% sensitivity and 64% accuracy for classifying 282

phosphorylation sites of PKC, CDK, PIKK, and INSR

89% sensitivity and 91% accuracy for classifying 309 phosphorylation sites of PKC, CDK, PIKK and INSR from HPRD (independent test)

Phosphorylation network Only kinase-substrate pairs

1.Using graph-based method to construct phosphorylation networks starting from membrane receptor to transcription factor 2.Using time-coursed microarray data to validate the discovered phosphorylation networks

To compare the predictive power between RegPhos and NetworKIN, the similar dataset of four well-known types of kinase group, such as PKC, CDK, PIKK and INSR, were used to evaluate the classifying power of RegPhos. There are totally 309 phosphorylation sites, which

were independent to training data, extracted from HPRD. By using logistic regression model to integrate the phosphorylation site prediction with protein associations (protein-protein interactions, functional associations, and subcellular localization), the predictive accuracy of RegPhos is higher than the NetworKIN, especially in INSR group. Finally, the constructed kinase-substrate phosphorylation network with statistically significant co-expression of time-coursed microarray data were provided to users.