Chapter 4 Conclusions
4.2 Major contributions and future works
According to our knowledge, module family, which comprises a group of homologous modules, is the first approach that identifies homologous modules of the module template from complete genomes through PPI families. We have developed a new method to identify homologous modules based on module templates of manually annotated protein complexes and crystal structures. Furthermore, the conserved and divergent internal PPIs of homologous modules provided clues to infer essential elements of modules.
For the origination and diversity of novel phenotypes, we will focus on two issues:
“What is (are) the essential element(s) of life” and “What is the formation of a new species”.
Some modules are evolutionarily cohesive, in other words, these cohesive modules are conserved across multiple species [64]. The relationships between the connected modules allow construction of the module-module interaction network which is regarded as the
24
connection between different functional modules in the interactome [65]. Intra-module proteins have less widespread mutational effects but inter-module proteins, which integration occurs between modules, have higher rate of amino-acid substitutions [66, 67]. According to previous studies, inter-module interactions have more evolutionary modifications than intra-module interactions.
Inter-module interactions of RNA polymerase II module in human are mediated by protein-protein interaction, such as POLR2B-MEN1, POLR2B-WWOX, and POLR2B-GSK3B (Fig. 15). In other word, the inter-module proteins interacting with POLR2B, including MEN1, WWOX, and GSK3B, and participate other BP annotations of proliferation, steroid metabolic process, and glycogen metabolic process, respectively.
Multiple endocrine meoplasia type 1 (MEN1) is a subunit of mixed-lineage leukemia (MLL) complex, a proto-oncogene with implication of development and leukemia pathogenesis [68, 69]. WWOX contains two WW domains at N-terminal and plays a role in regulating steroid metabolism [70]. Glycogen synthase kinase β (GSK3B) is a serine-threonine kinase with potent tumour suppressor qualities and regulates glucose storage and cell proliferation [71, 72]. In this section, we would propose a real case about the module-module interaction between RNA polymerase II module and MLL1 complex module.
The mechanism of RNA polymerase II module is involved in transcription that is the process of creating a complementary RNA copy of a sequence of DNA. MLL core complex uses a non-processive mechanism to catalyze multiple lysine methylations of histones, which is an important epigenetic indexing system for transcriptionally active and inactive chromatin domains in eukaryotic genomes [73]. Based on our concept of module family, we identified the module families of RNA polymerase and MLL complex. The module family of RNA polymerase was descripted above (Fig. 13 and 14). The MLL complex module in Homo sapiens consists of six components, including histone-lysine N-methyltransferase MLL
25
(MLL), menin (MEN1), Set1/Ash2 histone methyltransferase complex subunit ASH2 (ASH2L), rtinoblastoma-binding protein 5 (RBBP5), WD repeat-containing protein 82 (WDR82) and WD repeat-containing protein 5 (WDR5). In the MLL complex module family, there are two homologous modules (6 proteins and 15 PPIs) in Homo sapiens, one module (6 proteins and 15 PPIs) in Drosophila melanogaster and one module (5 proteins and 10 PPIs) in Saccharomyces cerevisiae. Interestingly, we found histone-lysine N-methyltransferase MLL2
(MLL2) is the homologs of MLL1 in Homo sapiens and could replace the MLL1 to form the MLL complex. However, only one homologs histone-lysine N-methyltransferase trithorax (trx) and histone-lysine N-methyltransferase, H3 lysine-4 specific (SET1) is in Drosophila melanogaster and Saccharomyces cerevisiae, respectively [74]. In addition, menin activates
the transcription of differentiation-regulating genes by covalent histone modification, and that this activity is related to tumor suppression by MEN1 [75-78]. Menin in the MLL complex associated with RNA polymerase II in Homo sapiens [79] and Drosophila melanogaster.
However, there are no Menin homologs found in Saccharomyces cerevisiae genome. SET1 replaces the part of interaction between RNA polymerase II module and MLL complex module in Saccharomyces cerevisiae (Fig. 15). According to our results, we could find not only diversity of intra-module interactions but also diversity of inter-module interactions between different organisms. It is useful to homologous modules in across-genome scale and offer biologists to realize evolutions of module and behaviors of interactome.
26
Tables
Table 1. The list of the number of modules in TOP 20 organisms from KEGG MODULE database KEGG
Taxonomy ID
NCBI
Taxonomy ID Organism Codes Organisms No. of modules in KEGG
MODULE database
T00772 507522 kpe Klebsiella pneumoniae 342 141
T00566 272620 kpn Klebsiella pneumoniae subsp. pneumoniae MGH 78578 141
T00910 484021 kpu Klebsiella pneumoniae NTUH-K2044 139
T01170 640131 kva Klebsiella variicola At-22 138
T01342 701347 esc Enterobacter cloacae SCF1 135
T00044 155864 ece Escherichia coli O157:H7 EDL933 134
T00672 439855 ecm Escherichia coli SMS-3-5 134
T00507 399742 ent Enterobacter sp. 638 133
T00784 409438 ecy Escherichia coli O152:H28 SE11 132
T00949 544404 etw Escherichia coli O157:H7 TW14359 132
T00831 585056 eum Escherichia coli O17:K52:H18 UMN026 132
T01422 741091 rah Rahnella sp. Y9602 132
T00778 444450 ecf Escherichia coli O157:H7 EC4115 131
T00338 364106 eci Escherichia coli O18:K1:H7 UTI89 131
T00829 585057 ect Escherichia coli O7:K1 IAI39 131
T00591 331112 ecx Escherichia coli O9 HS 131
T01098 573235 eoj Escherichia coli O26:H11 11368 131
T00068 316407 ecj Escherichia coli K-12 W3110 130
T00828 585034 ecr Escherichia coli O8 IAI1 130
T00048 386585 ecs Escherichia coli O157:H7 Sakai 130
27 Table 2. The list of data sets using definition and verification of module family
Data sets Comments
MIPS CORUM database [20] The CORUM database using as module template set provides manually annotated protein complexes, which assemble multiple proteins to perform biological functions, from mammalian organisms.
Annotated PPI database 275,787 experimental PPIs in the annotated PPI database (IntAct [21], BioGRID [22], DIP [23], MIPS [24], and MINT [25])
Predicted homologous PPI set Our previous sequence-based and structure-based homologous PPIs with joint E-value ≤ 10-40 [12] and Z-score ≥ 3 [14], including 290,137 sequence-based PPI families and 86,252 structure-based PPI families
Integr8 database [15] A complete genomic database (Integr8 version 103, containing 6,352,363 protein sequences in 2,274 species)
KEGG MODULE database [11] KEGG organism-specific modules is defined as a tight functional unit and complexes in the pathway through a set of orthologs
Gene Ontology (GO) database [17] We derive GO biological process (BP) to annotate homologous modules and GO molecular function (MF) to annotate core components of module family.
Extended module data set Extending one-layer PPIs and proteins for each protein in an original module through homologous PPIs
Random data sets Each module template constructed 50 random modules, which were selected randomly the same protein number from the genome of template's organism, and each random module was the same number of proteins with the module template.
PORC ortholog database [15] PORC (putative orthologous clusters) are defined as orthologous families from Integr8 database.
Essential genes database (DEG) [16] We collected 11,384 essential proteins in 25 species from DEG (version 6.5) database, including 8 eukaryotes and 17 prokaryotes.
EP8364 set We collected 8,364 essential proteins (called EP8364 set) from DEG database with at least one GO MF or GO BP terms.
CG27 set 160,598 proteins (called CG27 set) in 27 completed genomes (25 species in DEG database and 2 species in module template set) derived from Integr8 database
28
Table 3. Modified division groups from NCBI taxonomy database
Division group Division code a Division name b Number of species used in module family
MAM
PRI Primates
4
ROD Rodents
MAM Mammals
VRT VRT Vertebrates 3
INV INV Invertebrates 27
PLN c PLN Plants 42
BCT BCT Bacteria 1,596
N/A d
PHG Phages
7
VRL Viruses
SYN Synthetic
UNA Unassigned
ENV Environmental samples
a,b The division names and codes are derived from NCBI taxonomy database [30] (ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz).
c The PLN division group includes plants and fungi (e.g. Saccharomyces cerevisiae).
d. According to only 478 homologous modules (< 1 %) of 53,529 homologous modules (1,679 species) belong to phages, viruses, synthetic, unassigned, and environmental samples, therefore, we excluded these divisions.
29 Table 4. The 181 essential GO molecular functions (MF) terms
GO ID GO term Classification
Number of
GO:0004820 glycine-tRNA ligase activity Translation 53 0.0003 24 0.0029 8.6948 0 0.0000 0.0000
GO:0000049 tRNA binding Translation 394 0.0025 172 0.0206 8.3822 2 0.0025 1.0089
GO:0004818 glutamate-tRNA ligase activity Translation 40 0.0002 17 0.0020 8.1605 1 0.0012 4.9690
GO:0004827 proline-tRNA ligase activity Translation 38 0.0002 16 0.0019 8.0847 1 0.0012 5.2305
GO:0004832 valine-tRNA ligase activity Translation 41 0.0003 17 0.0020 7.9614 0 0.0000 0.0000
GO:0004825 methionine-tRNA ligase activity Translation 38 0.0002 15 0.0018 7.5794 1 0.0012 5.2305
GO:0004814 arginine-tRNA ligase activity Translation 54 0.0003 21 0.0025 7.4671 1 0.0012 3.6807
GO:0004824 lysine-tRNA ligase activity Translation 49 0.0003 19 0.0023 7.4453 1 0.0012 4.0563
GO:0004826 phenylalanine-tRNA ligase activity Translation 88 0.0005 32 0.0038 6.9822 0 0.0000 0.0000
GO:0004823 leucine-tRNA ligase activity Translation 43 0.0003 15 0.0018 6.6981 0 0.0000 0.0000
GO:0016149 translation release factor activity, codon specific Translation 83 0.0005 28 0.0033 6.4775 1 0.0012 2.3947
GO:0004831 tyrosine-tRNA ligase activity Translation 45 0.0003 15 0.0018 6.4004 0 0.0000 0.0000
GO:0004822 isoleucine-tRNA ligase activity Translation 39 0.0002 13 0.0016 6.4004 1 0.0012 5.0964
GO:0004817 cysteine-tRNA ligase activity Translation 47 0.0003 15 0.0018 6.1280 0 0.0000 0.0000
GO:0004526 ribonuclease P activity Translation 71 0.0004 22 0.0026 5.9496 0 0.0000 0.0000
GO:0004829 threonine-tRNA ligase activity Translation 49 0.0003 15 0.0018 5.8779 0 0.0000 0.0000
GO:0004816 asparagine-tRNA ligase activity Translation 33 0.0002 10 0.0012 5.8185 0 0.0000 0.0000
a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.
b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.
c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.
d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.
30 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)
GO ID GO term Classification
Number of
GO:0004807 triose-phosphate isomerase activity Carbohydrate
metabolism 34 0.0002 11 0.0013 6.2121 0 0.0000 0.0000
GO:0004751 ribose-5-phosphate isomerase activity Carbohydrate
metabolism 31 0.0002 10 0.0012 6.1939 0 0.0000 0.0000
GO:0004148 dihydrolipoyl dehydrogenase activity Carbohydrate
metabolism 44 0.0003 14 0.0017 6.1094 0 0.0000 0.0000
GO:0004618 phosphoglycerate kinase activity Carbohydrate
metabolism 42 0.0003 13 0.0016 5.9432 0 0.0000 0.0000
GO:0004742 dihydrolipoyllysine-residue acetyltransferase activity Carbohydrate
metabolism 30 0.0002 9 0.0011 5.7603 0 0.0000 0.0000
GO:0004477 methenyltetrahydrofolate cyclohydrolase activity Carbohydrate
metabolism 45 0.0003 13 0.0016 5.5470 0 0.0000 0.0000
a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.
b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.
c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.
d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.
31 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)
GO ID GO term Classification
Number of GO:0004488 methylenetetrahydrofolate dehydrogenase (NADP+) activity Carbohydrate
metabolism 42 0.0003 12 0.0014 5.4860 0 0.0000 0.0000
GO:0004802 transketolase activity Carbohydrate
metabolism 44 0.0003 10 0.0012 4.3639 0 0.0000 0.0000
GO:0004634 phosphopyruvate hydratase activity Carbohydrate
metabolism 58 0.0004 13 0.0016 4.3037 0 0.0000 0.0000
GO:0004347 glucose-6-phosphate isomerase activity Carbohydrate
metabolism 37 0.0002 8 0.0010 4.1516 0 0.0000 0.0000
GO:0003983 UTP:glucose-1-phosphate uridylyltransferase activity Carbohydrate
metabolism 32 0.0002 6 0.0007 3.6002 0 0.0000 0.0000
GO:0004615 phosphomannomutase activity Carbohydrate
metabolism 32 0.0002 6 0.0007 3.6002 0 0.0000 0.0000
GO:0004750 ribulose-phosphate 3-epimerase activity Carbohydrate
metabolism 52 0.0003 9 0.0011 3.3233 0 0.0000 0.0000
GO:0004365 glyceraldehyde-3-phosphate dehydrogenase (phosphorylating) activity
Carbohydrate
metabolism 84 0.0005 14 0.0017 3.2002 1 0.0012 2.3662
GO:0017176 phosphatidylinositol N-acetylglucosaminyltransferase activity Carbohydrate
metabolism 33 0.0002 5 0.0006 2.9093 3 0.0037 18.0691
GO:0004739 pyruvate dehydrogenase (acetyl-transferring) activity Carbohydrate
metabolism 53 0.0003 8 0.0010 2.8983 0 0.0000 0.0000
GO:0004619 phosphoglycerate mutase activity Carbohydrate
metabolism 50 0.0003 7 0.0008 2.6882 0 0.0000 0.0000
GO:0004332 fructose-bisphosphate aldolase activity Carbohydrate
metabolism 86 0.0005 12 0.0014 2.6792 0 0.0000 0.0000
GO:0042132 fructose 1,6-bisphosphate 1-phosphatase activity Carbohydrate
metabolism 39 0.0002 5 0.0006 2.4617 0 0.0000 0.0000
GO:0004579 dolichyl-diphosphooligosaccharide-protein glycotransferase activity
Carbohydrate
metabolism 64 0.0004 8 0.0010 2.4001 3 0.0037 9.3169
GO:0003872 6-phosphofructokinase activity Carbohydrate
metabolism 58 0.0004 7 0.0008 2.3174 0 0.0000 0.0000
GO:0050661 NADP or NADPH binding Carbohydrate
metabolism 449 0.0028 51 0.0061 2.1810 1 0.0012 0.4427
GO:0004743 pyruvate kinase activity Carbohydrate
metabolism 72 0.0004 8 0.0010 2.1335 0 0.0000 0.0000
a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.
b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.
c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.
d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.
32 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)
GO ID GO term Classification
Number of
GO:0004614 phosphoglucomutase activity Carbohydrate
metabolism 36 0.0002 4 0.0005 2.1335 0 0.0000 0.0000
GO:0000104 succinate dehydrogenase activity Carbohydrate
metabolism 55 0.0003 6 0.0007 2.0947 2 0.0025 7.2276
GO:0004591 oxoglutarate dehydrogenase (succinyl-transferring) activity Carbohydrate
metabolism 47 0.0003 5 0.0006 2.0427 0 0.0000 0.0000
a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.
b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.
c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.
d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.
33 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)
GO ID GO term Classification
Number of
GO:0003705 sequence-specific enhancer binding RNA polymerase II
transcription factor activity Transcription 151 0.0009 19 0.0023 2.4160 0 0.0000 0.0000
a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.
b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.
c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.
d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.
34 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)
GO ID GO term Classification
Number of
GO:0031071 cysteine desulfurase activity Amino acid
metabolism 39 0.0002 14 0.0017 6.8927 0 0.0000 0.0000
GO:0004478 methionine adenosyltransferase activity Amino acid
metabolism 51 0.0003 14 0.0017 5.2709 0 0.0000 0.0000
a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.
b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.
c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.
d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.
35 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)
GO ID GO term Classification
Number of GO:0004764 shikimate 5-dehydrogenase activity Amino acid
metabolism 45 0.0003 12 0.0014 5.1203 0 0.0000 0.0000
GO:0004834 tryptophan synthase activity Amino acid
metabolism 39 0.0002 9 0.0011 4.4310 0 0.0000 0.0000
GO:0004360 glutamine-fructose-6-phosphate transaminase (isomerizing) activity
Amino acid
metabolism 35 0.0002 8 0.0010 4.3888 0 0.0000 0.0000
GO:0003886 DNA (cytosine-5-)-methyltransferase activity Amino acid
metabolism 36 0.0002 6 0.0007 3.2002 2 0.0025 11.0422
GO:0004372 glycine hydroxymethyltransferase activity Amino acid
metabolism 61 0.0004 10 0.0012 3.1477 0 0.0000 0.0000
GO:0003861 3-isopropylmalate dehydratase activity Amino acid
metabolism 34 0.0002 5 0.0006 2.8237 0 0.0000 0.0000
GO:0004072 aspartate kinase activity Amino acid
metabolism 37 0.0002 5 0.0006 2.5947 0 0.0000 0.0000
GO:0004765 shikimate kinase activity Amino acid
metabolism 52 0.0003 7 0.0008 2.5848 0 0.0000 0.0000
GO:0004349 glutamate 5-kinase activity Amino acid
metabolism 30 0.0002 4 0.0005 2.5601 0 0.0000 0.0000
GO:0004049 anthranilate synthase activity Amino acid
metabolism 53 0.0003 6 0.0007 2.1737 0 0.0000 0.0000
a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.
b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.
c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.
d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.
36 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)
GO ID GO term Classification
Number of
GO:0004427 inorganic diphosphatase activity Oxidative
phosphorylation 65 0.0004 14 0.0017 4.1356 0 0.0000 0.0000
GO:0051538 3 iron, 4 sulfur cluster binding Oxidative
phosphorylation 43 0.0003 8 0.0010 3.5723 3 0.0037 13.8670
GO:0046933 hydrogen ion transporting ATP synthase activity, rotational mechanism
Oxidative
phosphorylation 280 0.0017 48 0.0057 3.2916 5 0.0062 3.5493
GO:0046961 proton-transporting ATPase activity, rotational mechanism Oxidative
phosphorylation 285 0.0018 42 0.0050 2.8296 5 0.0062 3.4870
GO:0008553 hydrogen-exporting ATPase activity, phosphorylative mechanism
Oxidative
phosphorylation 86 0.0005 9 0.0011 2.0094 1 0.0012 2.3112
GO:0004798 thymidylate kinase activity Pyrimidine
metabolism 33 0.0002 12 0.0014 6.9822 0 0.0000 0.0000
GO:0004799 thymidylate synthase activity Pyrimidine
metabolism 33 0.0002 12 0.0014 6.9822 0 0.0000 0.0000
GO:0004791 thioredoxin-disulfide reductase activity Pyrimidine
metabolism 49 0.0003 9 0.0011 3.5267 0 0.0000 0.0000
a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.
b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.
c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.
d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.
37 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)
GO ID GO term Classification
Number of
GO:0015450 P-P-bond-hydrolysis-driven protein transmembrane transporter
activity Others 189 0.0012 50 0.0060 5.0797 1 0.0012 1.0516
GO:0016836 hydro-lyase activity Others 40 0.0002 10 0.0012 4.8003 0 0.0000 0.0000
GO:0016783 sulfurtransferase activity Others 33 0.0002 8 0.0010 4.6548 0 0.0000 0.0000
GO:0004594 pantothenate kinase activity Others 48 0.0003 11 0.0013 4.4003 0 0.0000 0.0000
GO:0016624 oxidoreductase activity, acting on the aldehyde or oxo group of
donors, disulfide as acceptor Others 40 0.0002 9 0.0011 4.3202 0 0.0000 0.0000
GO:0008312 7S RNA binding Others 76 0.0005 17 0.0020 4.2950 1 0.0012 2.6153
GO:0008658 penicillin binding Others 86 0.0005 18 0.0022 4.0188 0 0.0000 0.0000
GO:0009374 biotin binding Others 109 0.0007 22 0.0026 3.8755 0 0.0000 0.0000
GO:0016709
oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, NADH or NADPH as one donor, and incorporation of one atom of oxygen
Others 50 0.0003 9 0.0011 3.4562 0 0.0000 0.0000
GO:0046914 transition metal ion binding Others 106 0.0007 19 0.0023 3.4417 0 0.0000 0.0000
GO:0003951 NAD+ kinase activity Others 67 0.0004 12 0.0014 3.4390 0 0.0000 0.0000
GO:0016820 hydrolase activity, acting on acid anhydrides, catalyzing
transmembrane movement of substances Others 74 0.0005 13 0.0016 3.3732 0 0.0000 0.0000
GO:0051087 chaperone binding Others 137 0.0009 24 0.0029 3.3637 3 0.0037 4.3524
GO:0016884 carbon-nitrogen ligase activity, with glutamine as
amido-N-donor Others 109 0.0007 19 0.0023 3.3470 0 0.0000 0.0000
a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.
b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.
c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.
d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.
38 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)
GO ID GO term Classification
Number of
a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.
b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.
c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.
d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.
39
Table 5. Validation of unannotated protein in core components by the orthology database (PORC) and essential GO MF terms
Sets (Interface evolution score)
Number of total proteins in core components
Number of unannotated proteins
in core components
Validated by orthologs (PORC) a
Validated by 181 essential GO MF
terms b
Validated by Children of 181
essential GO MF terms c
Total
≥ 9 146 33 12 10 18 24 (73%)
≥ 8 850 400 76 101 116 198 (50%)
a The number of proteins which have at least a orthologous protein, recorded as the essential protein in DEG, of PORC orthology database [15]
b The number of proteins which have at least an essential GO MF terms.
c The number of proteins which have at least an child of 181 essential GO MF terms.
.
40 Table 6. The proteins of core components in templates with interface evolution score ≥ 9.
Uniprot
Sequence similarity d Essential GO MF ID
Q8K2B3 10090_440 10 A0Q8D0 DEG10120353 1e-171 0.520
- - GO:0008177
Q02K68 DEG10150207 1e-167 0.520
O75489
O01602 DEG20020046 3e-79 0.627
- - GO:0008137
A0Q8H0 DEG10120379 1e-33 0.590
O75306
Q9CQA3 10090_440 10 Q6F8L0 DEG10130346 4e-71 0.567
GO:0051538 3 iron, 4 sulfur
cluster binding GO:0008177
Q02K69 DEG10150206 1e-70 0.560
Q3T189 9913_446 10 Q6F8L0 DEG10130346 1e-71 0.569
GO:0051538 3 iron, 4 sulfur
cluster binding GO:0008177
Q02K69 DEG10150206 5e-71 0.552
P52701
a The module family ID is combination of taxonomy ID and CORUM ID.
b The orthologs recorded in PORC database of the core protein are essential proteins in DEG database.
c The essential protein ID in DEG. Each ID represented the orthologs of core components regarded as the essential proteins. For example, the DEG ID of Q8K2B3 (Uniprot AC) is DEG10120353.
d The sequence similarity (BLASTP E-value and sequence identity) of orthologs using the protein of core components as the query.
e The GO MF term that is the children of essential GO MF terms could be considered as the essential GO MF terms.
41
Table 6. The proteins of core components in templates with interface evolution score ≥ 9. (Continued)
Uniprot
Sequence similarity d Essential GO MF
GO:0003682 Chromatin binding GO:0008270 GO:0046914 GO:0004003
GO:0003682 Chromatin binding GO:0008270 GO:0046914 GO:0004003
P04406 9606_280 9.96 Q6F9D5 DEG10130309; 8e-50 0.376 GO:0004365
a The module family ID is combination of taxonomy ID and CORUM ID.
b The orthologs recorded in PORC database of the core protein are essential proteins in DEG database.
c The essential protein ID in DEG. Each ID represented the orthologs of core components regarded as the essential proteins. For example, the DEG ID of Q8K2B3 (Uniprot AC) is DEG10120353.
d The sequence similarity (BLASTP E-value and sequence identity) of orthologs using the protein of core components as the query.
e The GO MF term that is the children of essential GO MF terms could be considered as the essential GO MF terms.
42
Table 6. The proteins of core components in templates with interface evolution score ≥ 9. (Continued)
Table 6. The proteins of core components in templates with interface evolution score ≥ 9. (Continued)