Network inference tools - Materials and Methods

2 Materials and Methods

2.4 Network inference tools

2.4.1 Correlation-based network inference tool

Our workflow to generate correlation-based interaction network begins by using all samples to calculate pairwise Pearson’s correlation coefficient matrix between each pair of OTUs. I introduce a null model to justify the biases caused by skewed distribution of OTUs relative abundance, e.g., most OTUs have small relative abundance values while few OTUs have large relative abundance values. Each pairwise Pearson’s correlation coefficient in the matrix is corrected by subtracting off

the expected correlations that are generated by the null model. The following

paragraphs would explain the details of how to obtain pairwise relationships and the null model.

The correlation-based methods are popular used to interpret the relationships between pairwise OTUs and the overall flowchart to generate correlation-based interactions is shown in Figure 1. However, it is necessary to use a null model to adjust all pairwise correlations and exam how the features of dataset could represent the relationships between each pair OTUs. Moreover, when applying correlation-based methods on relative abundance, i.e., sum of all OTU abundance is normalized to one, it is important to note that relative abundance of each OTU has already skewed original abundance (Weiss, Van Treuren et al. 2016). Therefore, the purpose of a null model is to evaluate the expected correlation values between each pair of OTUs if these pair of OTUs should not have real relationships (Ulrich and Gotelli 2010). In another word, if there is no real relationship between a pair of OTUs, a null model would evaluate the correlation coefficient level by calculating e strength of pairwise correlations.

Here, I describe the process of OTU shuffling that a null model used for our 16S rRNA dataset. For each iteration, one OTU is selected as the “focal OTU” (Figure 2).

A null abundance matrix is created by randomly resampling across all the samples without replacement from the relative abundance of each OTU that besides the focal OTU. Once the null model is created in each iteration, Pearson’s correlation

coefficient is calculated between each pair of focal OTU and the other randomized OTU. This process of calculating pairwise Pearson’s correlations between focal OTU and all others in each null model iterates for 200 times. In another word, I repeatedly create 200 null models leading to 200 Pearson’s correlations coefficients between focal OTU and each other OTU. The “expected” correlations for the focal OTU is obtained by taking the average from these 200 random iterations. I repeated this

process for each OTU as the focal OTU, which generated the expected matrix of OTU correlations. In the last step, the observed OTU Pearson’s correlation subtract the expected OTU Pearson’s correlations to produce the adjusted Pearson’s correlation coefficient matrix, where each value in the matrix is an observed correlation value subtract expected correlation value for any given pair of OTUs.

2.4.2 Regression-based network inference tool

The recent stand-alone tool, metagenomic microbial interaction stimulator

(MetaMIS), was developed based on gLV model to facilitate the systematic inference of microbial interaction (Shaw, Pao et al. 2016). MetaMIS is designed to provide a user-friendly platform to analyze the data from NGS and construct microbial interactions in a community. We used MetaMIS to infer microbial relations by introducing a single time-series dataset of microbial composition containing 12 time points. MetaMIS is a tool based on the generalized Lotka-Volterra model (Bhargava 1989), and designed to infer underlying microbial interactions according to

metagenomic abundance profiles. Lotka-Volterra equations have been widely used to infer animal interactions in dynamic systems, and recently have been applied to reveal microbial interacting relationships between operational taxonomic units (OTUs). The detailed algorithms and equations were described by Bucci et al. (Stein, Bucci et al.

2013). Due to the limitation of computing power (Interl® Core™ i7-4770 CPU

@3.40 GHz processor and 32 Gb RAM), it is not feasible to infer interaction network at the species level. Therefore, we assumed that in general, all species in a genus identified in our amplicon study share similar functions. According to the

compositional profiles among 12 time points at the genus level, MetaMIS can systematically examine interaction patterns, such as mutualism or competition; only the top 25% of interacting relationships were considered in this study. We used the mfinder tool (Milo, Shen-Orr et al. 2002) to identify significant 3-node directed motifs that contain two directed edges pointing to the same node. This process was

repeated 100 times and all relationships that passed the criteria described above at least 50 times (permutation cutoff > 0.5) were considered to be reliable interacting relationships.

2.4.3 Training dataset and validation

In order to choose network inference tool, I evaluated the performance of network relationships between correlation-based and regression-based methods by using in

silico datasets. It is critical for each network inference tool to provide a metric that is

able to demonstrate the correct topology or structure from the simulated interaction network. To be more specific, a metric of sensitivity, specificity, and the accuracy of the network inference tools in predicting the patterns of interactions must be address forehead. In the following two paragraphs, I will introduce the process to generate training datasets and the evaluation of network inference tools based on training datasets.

In order to compare the performance between correlation-based and regression-based network inference tools, I simulated a time series microbial community consisting of 50 OTUs. A number of 10 time series points are generated by using the function ordinary differential equation (ODE) in the deSolve R packages. The model that used in this simulation is as follow,

𝑑𝑥_#

𝑑𝑡 = 𝛼𝑥_# + (⁺ 𝛽_#*𝑥_#𝑥_*

*,-where the parameter

a

is an intrinsic positive growth rate and the interaction rate

b

refers to how microbe j affects the growth rate of microbe i. Typically, the parameter

b

ii are constrained to be -0.5. The intrinsic growth rate and initial abundance of each OTU are shown in Table 4. The initial abundance of OTUs in the simulated

community follow Poisson distribution because most of the gut bacteria in treefrog are in low abundance while only few bacteria are dominant in the intestine. All potential interactions across bacteria in the simulated community are also

demonstrated in an interaction matrix (Supplementary Table 1). Based on the model and parameter settings mentioned above, the complete simulated time series dataset that consist of 50 OTUs over 10 time points is shown in Figure 3.

In my evaluation, I define the number of true positive (TP), false negative (FN), false positive (FP), and true negative (TN) in the interaction networks. TP refers to the number of edges that are truly observed by the network inference tool, while FN is the number of edges that are not detected. If there is no edge truly present in the network, the number of edges that the network tool mistakenly predicts is defined as FP, while TN refers to the network tool truly predicts the absence of this edge. Using these definitions, I calculated sensitivity (i.e., true positive rate) and specificity (i.e., true negative rate), which are defined as follow:

𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =₅₆₇₈₉⁵⁶ 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =₅₉₇₈₆⁵⁹ The accuracy is defined as:

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =56789759786⁵⁶⁷⁵⁹

The values of sensitivity, specificity, and accuracy are always less than one. A low sensitivity value of a network tool demonstrates that the algorithm used in this tool do not capture the truly existing interactions in the network. Contrarily, a tool with a low specificity value indicates that the network tool suggests a low presence of

interactions that exist in real world. We applied two different criteria to represent the absence of inferred interactions for correlation-based and regression-based method because these two network tools are based on different algorithms. For the

correlation-based method, all the inferred interaction strength less than 0.6 are considered as no interactions due to the general criterium of Pearson’s correlations.

For the regression-based method, only interactions with top 2/3 interacting strength

are considered as real interactions while the inferred interactions with weak

interacting strength (i.e., < 1/3 of highest interacting strength) are considered as no real interactions. For regression-based method, I additionally set another strict criterium based on interacting strength to obtain more convincing interaction pairs (i.e., TP) from inferred network – I not only consider the presence of interaction and the same interacting direction, but also the difference of inferred interaction strength must less than 0.2.

在文檔中探索斑腿樹蛙腸道菌以及其網絡關係 (頁 44-49)