Major contributions and future works - 以蛋白質-蛋白質交互作用家族為基礎建立模板導向之同源模組

Chapter 4 Conclusions

4.2 Major contributions and future works

According to our knowledge, module family, which comprises a group of homologous modules, is the first approach that identifies homologous modules of the module template from complete genomes through PPI families. We have developed a new method to identify homologous modules based on module templates of manually annotated protein complexes and crystal structures. Furthermore, the conserved and divergent internal PPIs of homologous modules provided clues to infer essential elements of modules.

For the origination and diversity of novel phenotypes, we will focus on two issues:

“What is (are) the essential element(s) of life” and “What is the formation of a new species”.

Some modules are evolutionarily cohesive, in other words, these cohesive modules are conserved across multiple species [64]. The relationships between the connected modules allow construction of the module-module interaction network which is regarded as the

connection between different functional modules in the interactome [65]. Intra-module proteins have less widespread mutational effects but inter-module proteins, which integration occurs between modules, have higher rate of amino-acid substitutions [66, 67]. According to previous studies, inter-module interactions have more evolutionary modifications than intra-module interactions.

Inter-module interactions of RNA polymerase II module in human are mediated by protein-protein interaction, such as POLR2B-MEN1, POLR2B-WWOX, and POLR2B-GSK3B (Fig. 15). In other word, the inter-module proteins interacting with POLR2B, including MEN1, WWOX, and GSK3B, and participate other BP annotations of proliferation, steroid metabolic process, and glycogen metabolic process, respectively.

Multiple endocrine meoplasia type 1 (MEN1) is a subunit of mixed-lineage leukemia (MLL) complex, a proto-oncogene with implication of development and leukemia pathogenesis [68, 69]. WWOX contains two WW domains at N-terminal and plays a role in regulating steroid metabolism [70]. Glycogen synthase kinase β (GSK3B) is a serine-threonine kinase with potent tumour suppressor qualities and regulates glucose storage and cell proliferation [71, 72]. In this section, we would propose a real case about the module-module interaction between RNA polymerase II module and MLL1 complex module.

The mechanism of RNA polymerase II module is involved in transcription that is the process of creating a complementary RNA copy of a sequence of DNA. MLL core complex uses a non-processive mechanism to catalyze multiple lysine methylations of histones, which is an important epigenetic indexing system for transcriptionally active and inactive chromatin domains in eukaryotic genomes [73]. Based on our concept of module family, we identified the module families of RNA polymerase and MLL complex. The module family of RNA polymerase was descripted above (Fig. 13 and 14). The MLL complex module in Homo sapiens consists of six components, including histone-lysine N-methyltransferase MLL

(MLL), menin (MEN1), Set1/Ash2 histone methyltransferase complex subunit ASH2 (ASH2L), rtinoblastoma-binding protein 5 (RBBP5), WD repeat-containing protein 82 (WDR82) and WD repeat-containing protein 5 (WDR5). In the MLL complex module family, there are two homologous modules (6 proteins and 15 PPIs) in Homo sapiens, one module (6 proteins and 15 PPIs) in Drosophila melanogaster and one module (5 proteins and 10 PPIs) in Saccharomyces cerevisiae. Interestingly, we found histone-lysine N-methyltransferase MLL2

(MLL2) is the homologs of MLL1 in Homo sapiens and could replace the MLL1 to form the MLL complex. However, only one homologs histone-lysine N-methyltransferase trithorax (trx) and histone-lysine N-methyltransferase, H3 lysine-4 specific (SET1) is in Drosophila melanogaster and Saccharomyces cerevisiae, respectively [74]. In addition, menin activates

the transcription of differentiation-regulating genes by covalent histone modification, and that this activity is related to tumor suppression by MEN1 [75-78]. Menin in the MLL complex associated with RNA polymerase II in Homo sapiens [79] and Drosophila melanogaster.

However, there are no Menin homologs found in Saccharomyces cerevisiae genome. SET1 replaces the part of interaction between RNA polymerase II module and MLL complex module in Saccharomyces cerevisiae (Fig. 15). According to our results, we could find not only diversity of intra-module interactions but also diversity of inter-module interactions between different organisms. It is useful to homologous modules in across-genome scale and offer biologists to realize evolutions of module and behaviors of interactome.

Tables

Table 1. The list of the number of modules in TOP 20 organisms from KEGG MODULE database KEGG

Taxonomy ID

NCBI

Taxonomy ID Organism Codes Organisms No. of modules in KEGG

MODULE database

T00772 507522 kpe Klebsiella pneumoniae 342 141

T00566 272620 kpn Klebsiella pneumoniae subsp. pneumoniae MGH 78578 141

T00910 484021 kpu Klebsiella pneumoniae NTUH-K2044 139

T01170 640131 kva Klebsiella variicola At-22 138

T01342 701347 esc Enterobacter cloacae SCF1 135

T00044 155864 ece Escherichia coli O157:H7 EDL933 134

T00672 439855 ecm Escherichia coli SMS-3-5 134

T00507 399742 ent Enterobacter sp. 638 133

T00784 409438 ecy Escherichia coli O152:H28 SE11 132

T00949 544404 etw Escherichia coli O157:H7 TW14359 132

T00831 585056 eum Escherichia coli O17:K52:H18 UMN026 132

T01422 741091 rah Rahnella sp. Y9602 132

T00778 444450 ecf Escherichia coli O157:H7 EC4115 131

T00338 364106 eci Escherichia coli O18:K1:H7 UTI89 131

T00829 585057 ect Escherichia coli O7:K1 IAI39 131

T00591 331112 ecx Escherichia coli O9 HS 131

T01098 573235 eoj Escherichia coli O26:H11 11368 131

T00068 316407 ecj Escherichia coli K-12 W3110 130

T00828 585034 ecr Escherichia coli O8 IAI1 130

T00048 386585 ecs Escherichia coli O157:H7 Sakai 130

27 Table 2. The list of data sets using definition and verification of module family

Data sets Comments

MIPS CORUM database [20] The CORUM database using as module template set provides manually annotated protein complexes, which assemble multiple proteins to perform biological functions, from mammalian organisms.

Annotated PPI database 275,787 experimental PPIs in the annotated PPI database (IntAct [21], BioGRID [22], DIP [23], MIPS [24], and MINT [25])

Predicted homologous PPI set Our previous sequence-based and structure-based homologous PPIs with joint E-value ≤ 10^-40 [12] and Z-score ≥ 3 [14], including 290,137 sequence-based PPI families and 86,252 structure-based PPI families

Integr8 database [15] A complete genomic database (Integr8 version 103, containing 6,352,363 protein sequences in 2,274 species)

KEGG MODULE database [11] KEGG organism-specific modules is defined as a tight functional unit and complexes in the pathway through a set of orthologs

Gene Ontology (GO) database [17] We derive GO biological process (BP) to annotate homologous modules and GO molecular function (MF) to annotate core components of module family.

Extended module data set Extending one-layer PPIs and proteins for each protein in an original module through homologous PPIs

Random data sets Each module template constructed 50 random modules, which were selected randomly the same protein number from the genome of template's organism, and each random module was the same number of proteins with the module template.

PORC ortholog database [15] PORC (putative orthologous clusters) are defined as orthologous families from Integr8 database.

Essential genes database (DEG) [16] We collected 11,384 essential proteins in 25 species from DEG (version 6.5) database, including 8 eukaryotes and 17 prokaryotes.

EP8364 set We collected 8,364 essential proteins (called EP8364 set) from DEG database with at least one GO MF or GO BP terms.

CG27 set 160,598 proteins (called CG27 set) in 27 completed genomes (25 species in DEG database and 2 species in module template set) derived from Integr8 database

Table 3. Modified division groups from NCBI taxonomy database

Division group Division code ^a Division name ^b Number of species used in module family

MAM

PRI Primates

ROD Rodents

MAM Mammals

VRT VRT Vertebrates 3

INV INV Invertebrates 27

PLN ^c PLN Plants 42

BCT BCT Bacteria 1,596

N/A^d

PHG Phages

VRL Viruses

SYN Synthetic

UNA Unassigned

ENV Environmental samples

a,b The division names and codes are derived from NCBI taxonomy database [30] (ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz).

c The PLN division group includes plants and fungi (e.g. Saccharomyces cerevisiae).

d. According to only 478 homologous modules (< 1 %) of 53,529 homologous modules (1,679 species) belong to phages, viruses, synthetic, unassigned, and environmental samples, therefore, we excluded these divisions.

29 Table 4. The 181 essential GO molecular functions (MF) terms

GO ID GO term Classification

Number of

GO:0004820 glycine-tRNA ligase activity Translation 53 0.0003 24 0.0029 8.6948 0 0.0000 0.0000

GO:0000049 tRNA binding Translation 394 0.0025 172 0.0206 8.3822 2 0.0025 1.0089

GO:0004818 glutamate-tRNA ligase activity Translation 40 0.0002 17 0.0020 8.1605 1 0.0012 4.9690

GO:0004827 proline-tRNA ligase activity Translation 38 0.0002 16 0.0019 8.0847 1 0.0012 5.2305

GO:0004832 valine-tRNA ligase activity Translation 41 0.0003 17 0.0020 7.9614 0 0.0000 0.0000

GO:0004825 methionine-tRNA ligase activity Translation 38 0.0002 15 0.0018 7.5794 1 0.0012 5.2305

GO:0004814 arginine-tRNA ligase activity Translation 54 0.0003 21 0.0025 7.4671 1 0.0012 3.6807

GO:0004824 lysine-tRNA ligase activity Translation 49 0.0003 19 0.0023 7.4453 1 0.0012 4.0563

GO:0004826 phenylalanine-tRNA ligase activity Translation 88 0.0005 32 0.0038 6.9822 0 0.0000 0.0000

GO:0004823 leucine-tRNA ligase activity Translation 43 0.0003 15 0.0018 6.6981 0 0.0000 0.0000

GO:0016149 translation release factor activity, codon specific Translation 83 0.0005 28 0.0033 6.4775 1 0.0012 2.3947

GO:0004831 tyrosine-tRNA ligase activity Translation 45 0.0003 15 0.0018 6.4004 0 0.0000 0.0000

GO:0004822 isoleucine-tRNA ligase activity Translation 39 0.0002 13 0.0016 6.4004 1 0.0012 5.0964

GO:0004817 cysteine-tRNA ligase activity Translation 47 0.0003 15 0.0018 6.1280 0 0.0000 0.0000

GO:0004526 ribonuclease P activity Translation 71 0.0004 22 0.0026 5.9496 0 0.0000 0.0000

GO:0004829 threonine-tRNA ligase activity Translation 49 0.0003 15 0.0018 5.8779 0 0.0000 0.0000

GO:0004816 asparagine-tRNA ligase activity Translation 33 0.0002 10 0.0012 5.8185 0 0.0000 0.0000

a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.

b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.

c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.

d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.

30 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)

GO ID GO term Classification

Number of

GO:0004807 triose-phosphate isomerase activity Carbohydrate

metabolism 34 0.0002 11 0.0013 6.2121 0 0.0000 0.0000

GO:0004751 ribose-5-phosphate isomerase activity Carbohydrate

metabolism 31 0.0002 10 0.0012 6.1939 0 0.0000 0.0000

GO:0004148 dihydrolipoyl dehydrogenase activity Carbohydrate

metabolism 44 0.0003 14 0.0017 6.1094 0 0.0000 0.0000

GO:0004618 phosphoglycerate kinase activity Carbohydrate

metabolism 42 0.0003 13 0.0016 5.9432 0 0.0000 0.0000

GO:0004742 dihydrolipoyllysine-residue acetyltransferase activity Carbohydrate

metabolism 30 0.0002 9 0.0011 5.7603 0 0.0000 0.0000

GO:0004477 methenyltetrahydrofolate cyclohydrolase activity Carbohydrate

metabolism 45 0.0003 13 0.0016 5.5470 0 0.0000 0.0000

a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.

b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.

c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.

d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.

31 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)

GO ID GO term Classification

Number of GO:0004488 methylenetetrahydrofolate dehydrogenase (NADP+) activity Carbohydrate

metabolism 42 0.0003 12 0.0014 5.4860 0 0.0000 0.0000

GO:0004802 transketolase activity Carbohydrate

metabolism 44 0.0003 10 0.0012 4.3639 0 0.0000 0.0000

GO:0004634 phosphopyruvate hydratase activity Carbohydrate

metabolism 58 0.0004 13 0.0016 4.3037 0 0.0000 0.0000

GO:0004347 glucose-6-phosphate isomerase activity Carbohydrate

metabolism 37 0.0002 8 0.0010 4.1516 0 0.0000 0.0000

GO:0003983 UTP:glucose-1-phosphate uridylyltransferase activity Carbohydrate

metabolism 32 0.0002 6 0.0007 3.6002 0 0.0000 0.0000

GO:0004615 phosphomannomutase activity Carbohydrate

metabolism 32 0.0002 6 0.0007 3.6002 0 0.0000 0.0000

GO:0004750 ribulose-phosphate 3-epimerase activity Carbohydrate

metabolism 52 0.0003 9 0.0011 3.3233 0 0.0000 0.0000

GO:0004365 glyceraldehyde-3-phosphate dehydrogenase (phosphorylating) activity

Carbohydrate

metabolism 84 0.0005 14 0.0017 3.2002 1 0.0012 2.3662

GO:0017176 phosphatidylinositol N-acetylglucosaminyltransferase activity Carbohydrate

metabolism 33 0.0002 5 0.0006 2.9093 3 0.0037 18.0691

GO:0004739 pyruvate dehydrogenase (acetyl-transferring) activity Carbohydrate

metabolism 53 0.0003 8 0.0010 2.8983 0 0.0000 0.0000

GO:0004619 phosphoglycerate mutase activity Carbohydrate

metabolism 50 0.0003 7 0.0008 2.6882 0 0.0000 0.0000

GO:0004332 fructose-bisphosphate aldolase activity Carbohydrate

metabolism 86 0.0005 12 0.0014 2.6792 0 0.0000 0.0000

GO:0042132 fructose 1,6-bisphosphate 1-phosphatase activity Carbohydrate

metabolism 39 0.0002 5 0.0006 2.4617 0 0.0000 0.0000

GO:0004579 dolichyl-diphosphooligosaccharide-protein glycotransferase activity

Carbohydrate

metabolism 64 0.0004 8 0.0010 2.4001 3 0.0037 9.3169

GO:0003872 6-phosphofructokinase activity Carbohydrate

metabolism 58 0.0004 7 0.0008 2.3174 0 0.0000 0.0000

GO:0050661 NADP or NADPH binding Carbohydrate

metabolism 449 0.0028 51 0.0061 2.1810 1 0.0012 0.4427

GO:0004743 pyruvate kinase activity Carbohydrate

metabolism 72 0.0004 8 0.0010 2.1335 0 0.0000 0.0000

a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.

b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.

c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.

d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.

32 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)

GO ID GO term Classification

Number of

GO:0004614 phosphoglucomutase activity Carbohydrate

metabolism 36 0.0002 4 0.0005 2.1335 0 0.0000 0.0000

GO:0000104 succinate dehydrogenase activity Carbohydrate

metabolism 55 0.0003 6 0.0007 2.0947 2 0.0025 7.2276

GO:0004591 oxoglutarate dehydrogenase (succinyl-transferring) activity Carbohydrate

metabolism 47 0.0003 5 0.0006 2.0427 0 0.0000 0.0000

a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.

b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.

c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.

d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.

33 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)

GO ID GO term Classification

Number of

GO:0003705 sequence-specific enhancer binding RNA polymerase II

transcription factor activity Transcription 151 0.0009 19 0.0023 2.4160 0 0.0000 0.0000

a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.

b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.

c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.

d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.

34 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)

GO ID GO term Classification

Number of

GO:0031071 cysteine desulfurase activity Amino acid

metabolism 39 0.0002 14 0.0017 6.8927 0 0.0000 0.0000

GO:0004478 methionine adenosyltransferase activity Amino acid

metabolism 51 0.0003 14 0.0017 5.2709 0 0.0000 0.0000

a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.

b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.

c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.

d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.

35 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)

GO ID GO term Classification

Number of GO:0004764 shikimate 5-dehydrogenase activity Amino acid

metabolism 45 0.0003 12 0.0014 5.1203 0 0.0000 0.0000

GO:0004834 tryptophan synthase activity Amino acid

metabolism 39 0.0002 9 0.0011 4.4310 0 0.0000 0.0000

GO:0004360 glutamine-fructose-6-phosphate transaminase (isomerizing) activity

Amino acid

metabolism 35 0.0002 8 0.0010 4.3888 0 0.0000 0.0000

GO:0003886 DNA (cytosine-5-)-methyltransferase activity Amino acid

metabolism 36 0.0002 6 0.0007 3.2002 2 0.0025 11.0422

GO:0004372 glycine hydroxymethyltransferase activity Amino acid

metabolism 61 0.0004 10 0.0012 3.1477 0 0.0000 0.0000

GO:0003861 3-isopropylmalate dehydratase activity Amino acid

metabolism 34 0.0002 5 0.0006 2.8237 0 0.0000 0.0000

GO:0004072 aspartate kinase activity Amino acid

metabolism 37 0.0002 5 0.0006 2.5947 0 0.0000 0.0000

GO:0004765 shikimate kinase activity Amino acid

metabolism 52 0.0003 7 0.0008 2.5848 0 0.0000 0.0000

GO:0004349 glutamate 5-kinase activity Amino acid

metabolism 30 0.0002 4 0.0005 2.5601 0 0.0000 0.0000

GO:0004049 anthranilate synthase activity Amino acid

metabolism 53 0.0003 6 0.0007 2.1737 0 0.0000 0.0000

a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.

b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.

c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.

d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.

36 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)

GO ID GO term Classification

Number of

GO:0004427 inorganic diphosphatase activity Oxidative

phosphorylation 65 0.0004 14 0.0017 4.1356 0 0.0000 0.0000

GO:0051538 3 iron, 4 sulfur cluster binding Oxidative

phosphorylation 43 0.0003 8 0.0010 3.5723 3 0.0037 13.8670

GO:0046933 hydrogen ion transporting ATP synthase activity, rotational mechanism

Oxidative

phosphorylation 280 0.0017 48 0.0057 3.2916 5 0.0062 3.5493

GO:0046961 proton-transporting ATPase activity, rotational mechanism Oxidative

phosphorylation 285 0.0018 42 0.0050 2.8296 5 0.0062 3.4870

GO:0008553 hydrogen-exporting ATPase activity, phosphorylative mechanism

Oxidative

phosphorylation 86 0.0005 9 0.0011 2.0094 1 0.0012 2.3112

GO:0004798 thymidylate kinase activity Pyrimidine

metabolism 33 0.0002 12 0.0014 6.9822 0 0.0000 0.0000

GO:0004799 thymidylate synthase activity Pyrimidine

metabolism 33 0.0002 12 0.0014 6.9822 0 0.0000 0.0000

GO:0004791 thioredoxin-disulfide reductase activity Pyrimidine

metabolism 49 0.0003 9 0.0011 3.5267 0 0.0000 0.0000

a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.

b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.

c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.

d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.

37 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)

GO ID GO term Classification

Number of

GO:0015450 P-P-bond-hydrolysis-driven protein transmembrane transporter

activity Others 189 0.0012 50 0.0060 5.0797 1 0.0012 1.0516

GO:0016836 hydro-lyase activity Others 40 0.0002 10 0.0012 4.8003 0 0.0000 0.0000

GO:0016783 sulfurtransferase activity Others 33 0.0002 8 0.0010 4.6548 0 0.0000 0.0000

GO:0004594 pantothenate kinase activity Others 48 0.0003 11 0.0013 4.4003 0 0.0000 0.0000

GO:0016624 oxidoreductase activity, acting on the aldehyde or oxo group of

donors, disulfide as acceptor Others 40 0.0002 9 0.0011 4.3202 0 0.0000 0.0000

GO:0008312 7S RNA binding Others 76 0.0005 17 0.0020 4.2950 1 0.0012 2.6153

GO:0008658 penicillin binding Others 86 0.0005 18 0.0022 4.0188 0 0.0000 0.0000

GO:0009374 biotin binding Others 109 0.0007 22 0.0026 3.8755 0 0.0000 0.0000

GO:0016709

oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, NADH or NADPH as one donor, and incorporation of one atom of oxygen

Others 50 0.0003 9 0.0011 3.4562 0 0.0000 0.0000

GO:0046914 transition metal ion binding Others 106 0.0007 19 0.0023 3.4417 0 0.0000 0.0000

GO:0003951 NAD+ kinase activity Others 67 0.0004 12 0.0014 3.4390 0 0.0000 0.0000

GO:0016820 hydrolase activity, acting on acid anhydrides, catalyzing

transmembrane movement of substances Others 74 0.0005 13 0.0016 3.3732 0 0.0000 0.0000

GO:0051087 chaperone binding Others 137 0.0009 24 0.0029 3.3637 3 0.0037 4.3524

GO:0016884 carbon-nitrogen ligase activity, with glutamine as

amido-N-donor Others 109 0.0007 19 0.0023 3.3470 0 0.0000 0.0000

a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.

b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.

c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.

d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.

38 Table 4. The 181 essential GO molecular functions (MF) terms (Continued)

GO ID GO term Classification

Number of

a The CG27 set (160,598 proteins annotated ≥ 1 GO MF terms) consists of 25 species in DEG and 2 species in module template set.

b The occur ratio of a GO MF term is defined as the number of proteins annotated this terms divided by the total number of proteins in the set.

c The unique ratio of a GO MF term is defined as the occur ratio of a GO MF term divided by the occur ratio in 27 species genome set.

d The core components of module templates represent the core components in module families with PPI evolution score ≥ 8 and at least one GO MF term annotation in GO database.

Table 5. Validation of unannotated protein in core components by the orthology database (PORC) and essential GO MF terms

Sets (Interface evolution score)

Number of total proteins in core components

Number of unannotated proteins

in core components

Validated by orthologs (PORC) ^a

Validated by 181 essential GO MF

terms ^b

Validated by Children of 181

essential GO MF terms ^c

Total

≥ 9 146 33 12 10 18 24 (73%)

≥ 8 850 400 76 101 116 198 (50%)

a The number of proteins which have at least a orthologous protein, recorded as the essential protein in DEG, of PORC orthology database [15]

b The number of proteins which have at least an essential GO MF terms.

c The number of proteins which have at least an child of 181 essential GO MF terms.

40 Table 6. The proteins of core components in templates with interface evolution score ≥ 9.

Uniprot

Sequence similarity^d Essential GO MF ID

Q8K2B3 10090_440 10 A0Q8D0 DEG10120353 1e-171 0.520

- - GO:0008177

Q02K68 DEG10150207 1e-167 0.520

O75489

O01602 DEG20020046 3e-79 0.627

- - GO:0008137

A0Q8H0 DEG10120379 1e-33 0.590

O75306

Q9CQA3 10090_440 10 Q6F8L0 DEG10130346 4e-71 0.567

GO:0051538 3 iron, 4 sulfur

cluster binding GO:0008177

Q02K69 DEG10150206 1e-70 0.560

Q3T189 9913_446 10 Q6F8L0 DEG10130346 1e-71 0.569

GO:0051538 3 iron, 4 sulfur

cluster binding GO:0008177

Q02K69 DEG10150206 5e-71 0.552

P52701

a The module family ID is combination of taxonomy ID and CORUM ID.

b The orthologs recorded in PORC database of the core protein are essential proteins in DEG database.

c The essential protein ID in DEG. Each ID represented the orthologs of core components regarded as the essential proteins. For example, the DEG ID of Q8K2B3 (Uniprot AC) is DEG10120353.

d The sequence similarity (BLASTP E-value and sequence identity) of orthologs using the protein of core components as the query.

e The GO MF term that is the children of essential GO MF terms could be considered as the essential GO MF terms.

Table 6. The proteins of core components in templates with interface evolution score ≥ 9. (Continued)

Uniprot

Sequence similarity^d Essential GO MF

GO:0003682 Chromatin binding GO:0008270 GO:0046914 GO:0004003

P04406 9606_280 9.96 Q6F9D5 DEG10130309; 8e-50 0.376 GO:0004365

a The module family ID is combination of taxonomy ID and CORUM ID.

b The orthologs recorded in PORC database of the core protein are essential proteins in DEG database.

c The essential protein ID in DEG. Each ID represented the orthologs of core components regarded as the essential proteins. For example, the DEG ID of Q8K2B3 (Uniprot AC) is DEG10120353.

d The sequence similarity (BLASTP E-value and sequence identity) of orthologs using the protein of core components as the query.

e The GO MF term that is the children of essential GO MF terms could be considered as the essential GO MF terms.

Table 6. The proteins of core components in templates with interface evolution score ≥ 9. (Continued)

在文檔中以蛋白質-蛋白質交互作用家族為基礎建立模板導向之同源模組 (頁 32-73)