• 沒有找到結果。

S PECIFIC : B RIDGE AND B RICK N ETWORK M OTIF -D ETECTING A LGORITHMS

CHAPTER 3 NETWORK MOTIF EXPERIMENTS

3.2 S PECIFIC : B RIDGE AND B RICK N ETWORK M OTIF -D ETECTING A LGORITHMS

The above-described method successfully addresses some problems in specific domains. However, in biology, the concept of ―neighborhood‖ is especially useful for identifying motifs or modules. Accordingly, in this section, I will propose another version of bridge and brick network motifs in the context biology.

3.2.1 Validation

To validate the respective roles of weak and strong links, equal percentages of each (as well as random links) were removed. For E. coli and S. cerevisiae [16, 26], the greater the number of strong links removed, the lower the clustering coefficient relative to the randomly removed links. In contrast, the greater the number of weak links removed, the higher the clustering coefficient relative to the randomly removed links (Figs. 19 and 20). Note that the average clustering coefficient increases when

weak links are removed—that is, when the clustering coefficient of a weak link‘s end node is calculated, its neighbors do not include the same link‘s other end node. The

average coefficient increases after the weak links are removed because the two end nodes do not share a large number of common neighbors. The average degree of

64

separation in the network after removing links was not computed, since a network might become broken and disconnected after a link is removed, and the definition of average degree of separation is based on a connected network. Note that the proposed approach is insensitive to data errors: significant network motif sets in the two gene regulation networks do not change a great deal, even when 40% of their edges are removed (Figs. 21 and 22). All altered results (red curves) shown in Figures 21 and 22 represent average values for 30 runs. The sensitivity analysis results confirmed

significant similarities between the original and altered networks after randomly removing 40% of their links. According to the triad significance profile (TSP) [26] of brick motifs, the original and altered networks belong to the same superfamily.

As shown in Figure 23, link weight distribution is extremely polarized (either 0 or >2), which matches the criterion for distinguishing between strong and weak links (i.e., mean weighted value LinkAVG = 0.9 and standard deviation LinkSTD = 0.04 for all links in 1,000 randomized networks). In most cases random networks have many more weak links than strong links. At least one researcher has suggested that high degree of clustering is a generic feature of biological networks [65].

The link property is a good indicator of cellular function robustness. The simplest strategy for protecting against the failure of a specific component is to provide

65

alternative ways to perform that component‘s function. At the molecular level, this

backup strategy (or genetic buffering) [103, 104] can be carried out by duplicate genes with identical roles or by different genes that constitute an alternate but functionally overlapping path [83]. Researchers can use brick motifs to explore (a) identical genes that diverge functionally, (b) reasons why the biological networks of unreliable elements still perform reliably [1], and (c) the degeneracy phenomenon [65].

Fig. 19. Relationships between clustering coefficients and different removal ratios for three E. coli link types. Red, random; green, strong; blue, weak.

66

Fig. 20. Relationships between clustering coefficients and different removal ratios for three S. cerevisiae (yeast) link types. Red, random; green, strong; blue, weak.

Correlation coefficient c = 0.97

E. coli network after randomly removing 40% of its links Original E.coli network

Fig. 21. Comparison of original (blue curve) and altered (red curve) brick motif ratio profiles for E. coli after randomly removing 40% of its links. Altered results represent average values for 30 runs.

67

Fig. 22. Comparison between original brick motif ratio profiles and altered brick motif ratio profiles for S. cerevisiae (yeast) after randomly removing 40% of its links.

Fig. 23. Distribution of link weights in E.coli. Average mean and standard

deviation of link weights for randomized networks were calculated as 0.900.04.

3.2.2 Experiments

The proposed method was applied to E. coli (bacteria) and S. cerevisiae (yeast)

68

transcriptional gene regulation networks [26]. Network and source data are listed in Tables 4 and 5. In both networks, nodes represent operons (i.e., one or more genes transcribed on the same mRNA [26]) and directed links represent transcriptional regulatory relationships between operons that encode transcription factors (TFs) and operons regulated by TFs. Many v-out and feed-forward loop (FFL) brick motifs were observed in both E. coli and S. cerevisiae (ID = 1 and 5, respectively) (Table 5). The FFL (a three-gene subgraph) is composed of two input transcription factors, one regulating the other and both jointly regulating a target gene [80]. The observation that FFL bridge motifs do not exist in either network supports previous findings indicating that most motifs do not function in isolation, but overlap with known biological functions [24, 42, 100]. Specifically, one FFL motif cluster overlaps with the flagella motor module and another contains a significant number of elements responsible for regulating the E. coli aerobic/anaerobic switch [84]. Since most FFL motifs consist of strong links, it is suggested that many (if not all) FFL motif

interactions can be used as parts of other motifs or modules (e.g., for flagella motor, osmoregulated porin gene, oxidative stress response, methionoine biosynthesis modules) in a manner that makes the most efficient use of each gene or operon

archive [84]. Accordingly, FFL brick motifs are viewed as having an optimal design in

69

terms of convergent evolution in transcriptional gene regulation networks [105].

The other motif type that is well represented in both networks is the four-gene bi-fan pattern associated with bridge motifs (Table 5). The bi-fan consists of two input transcription factors, one never regulating the other, but both jointly regulating two target genes. In E. coli, 208 of the 209 observed bi-fan motifs combined to create dual motif clusters in which most links are shared by at least two adjacent motifs in

addition to multiple non-adjacent motifs [84]. No bi-fan brick motifs were found, but 107 bi-fan bridge motifs that did not overlap with other motifs were noted, indicating that they function by themselves. These observations suggest a low co-regulation ratio for two operons in which one regulates the other.

Using the bi-fan bridge motif consisting of aroL, mtr, TrpR, and TyrR as an example, the combination of the TyrR protein and TrpR repressor is responsible for regulating other aromatic amino acid transport genes [106]. The TyrR protein plus either

phenylalanine or tyrosine is responsible for mtr gene activation, while a combination of the TrpR repressor plus tryptophan represses the mtr gene [107]. Both TyrR and TrpR regulate the expression of the aroL gene-encoding enzyme shikimate kinase II in E. coli [84]. Also found were 51 brick motifs (ID = 206) consisting of combinations of FFL and bi-fan motifs. As Dobrin [98] reports, these motifs form a heterologous

70

motif superstructure. The present results for S. cerevisiae are similar to those for E.

coli. After comparing these with Milo et al.‘s [16], it was determined that v-out (ID = 1) and FFL brick motifs (ID = 5) play important roles in both networks (Figs. 20 and 21). Furthermore, the brick motif ratio profiles in the two gene regulation networks are very similar (correlation coefficient c = 0.96) (Fig. 26), even though they contain relatively few brick motifs [16].

An effort was made to learn more about the relationship between coherent (incoherent) FFLs [88] and brick (bridge) FFLs. Since each of the three FFL interactions can be either activating or repressing, FFLs have eight possible structural types [80, 84]. The four incoherent FFL types act as sign-sensitive accelerators that shorten the response time of target gene expression following stimuli in one direction (e.g., off to on), but not the other. The four coherent FFL types act as sign-sensitive delays. E. coli contains 34 coherent FFLs, 8 incoherent FFLs [84], 29 brick-coherent FFLs, and 6 brick-incoherent FFLs. Accordingly, the difference in coherent (incoherent) FFL frequencies cannot be simply explained by the relative abundances of brick and bridge motifs in a network.

Table 4. Descriptions of five gene regulation networks: edge and node definitions,

71

network sizes, and references.

Network Type Common Feature

Directed Network

Nodes Edges Description

Gene

688 1079 Saccaromyces cerevisiae [108]

Drosophila 110 307 Drosophila melanogaster

www.csa.ru/Inst/gorb_dep/inbios/genet/genet.htm

Sea urchin 43 58 Sea urchin [108]

C. elegans 280 2170 C. elegans (all synaptic connections used; not restricted to those with  5 synapses) [26]

72

Table 5. Brick and bridge motifs in five gene regulation networks.

Network Nodes Links Motif Type ID

We will use the bi-fan bridge motif consisting of aroL, mtr, TrpR, and TyrR as an example. The combination of the TyrR protein and TrpR repressor are responsible for regulating other aromatic amino acid transport genes [57]. The TyrR protein plus either phenylalanine or tyrosine is responsible for mtr gene activation, while a combination of the TrpR repressor plus tryptophan represses the mtr gene [58]. Both TyrR and TrpR regulate the expression of the aroL gene-encoding enzyme shikimate kinase II in E. coli [42]. We also found 51 brick motifs (ID = 206) consisting of combinations of FFL and bi-fan motifs. As Dobrin [56] reports, these motifs form a

73

heterologous motif superstructure. Our results for S. cerevisiae are similar to those for E. coli. After comparing our results with Milo et al.‘s [28], we determined that v-out

(ID = 1) and FFL brick motifs (ID = 5) play important roles in both networks (Figs.

20 and 21). Furthermore, the brick motif ratio profiles in the two gene regulation networks are very similar (correlation coefficient c = 0.96) (Fig. 26), even though they contain relatively few brick motifs [28]. We made an effort to learn more about the relationship between coherent (incoherent) FFLs [12] and brick (bridge) FFLs. Since each of the three FFL interactions can be either activating or repressing, FFLs have eight possible structural types [13], [42]. The four incoherent FFL types act as sign-sensitive accelerators that shorten the response time of target gene expression following stimuli in one direction (e.g., off to on) but not the other. The four coherent FFL types act as sign-sensitive delays. E. coli contains 34 coherent FFLs, 8 incoherent FFLs [42], 29 brick-coherent FFLs, and 6 brick-incoherent FFLs. Accordingly, the difference in coherent (incoherent) FFL frequencies cannot be simply explained by the relative abundances of brick and bridge motifs in a network.

Next, the proposed method was applied to transcription networks that guide development in Drosophila melanogaster and sea urchin, and synaptic wiring in Caenorhabditis elegans (Table 4). As in the two gene regulation networks, brick TSPs

74

were more significant than bridge TSPs in these three networks. However, it was also determined that four bridge motifs (ID = 5, 6, 11, and 12) in C. elegans are very significant (Table 5), indicating the greater presence of isolated motifs. This suggests that these bridge motifs constitute the main difference between the C. elegans network and the Drosophila and sea urchin networks (Fig. 27). Similarities (differences) in bridge and brick motifs imply similar (different) key circuit elements in each organism.

Fig. 24. Comparisons of triad significance profiles (TSPs) for our bridge and brick motifs and Milo et al.’s [7], [28] E. coli motifs.

-1

75

Fig. 25. Comparisons of triad significance profiles (TSPs) for our bridge and brick motifs and Milo et al.’s [7], [28] S. cerevisiae (yeast) motifs.

-1

Correlation coefficient c = 0.96

Fig. 26. Brick motif ratio profiles for two gene regulation networks: E. coli and S. cerevisiae (yeast).

Fig. 27. Bridge motif ratio profiles for three gene regulation networks: C.

elegans, sea urchin, and Drosophila.

76

3.2.3 Conclusion

According to the above definitions of weighted links and network motifs and the results of validation experiments using two gene transcription regulation networks, the presence of bridge and brick motifs in a biological network is closely associated with network topological structures (especially local connections) but not with network size (i.e., number of nodes). Bridge motifs can assist in the identification of isolated motifs, and brick motifs can be used to locate motifs whose functions overlap.

This combination of a statistically significant motif and strong or weak-link properties provides insight to the structural organizing principles and functions of networks. It can also serve as a method for analyzing biological system robustness.

77

Chapter 4 Social Network Simulation

相關文件