1 Introduction
1.4 Network inference provides insight into the mechanisms of gut bacterial assembly
1.4.1 The potentials of top-down approach on microbial community
The use of genome sequences and related approach (Pace 1997, Giovannoni and Stingl 2005) has overcome the need for cultivation to characterize and identify microorganisms in nature. Advance in sequencing technologies couples with new bioinformatic developments have allowed the scientific researchers to investigate the microbes that inhabit everywhere (Gilbert and Dupont 2011). Over the last 10 – 15 years, studies for the compositions of microbial community have increasedexponentially. High-throughput sequencing (i.e., next-generation sequencing, NGS) technology have generated massive amount of reads on small-subunit ribosomal RNA gene (16S rRNA), which elucidates the composition of microbial community. Several 16S rRNA processing pipelines were established using software packages, such as
MOTHUR (Schloss, Westcott et al. 2009) or Quantitative Insights Into Microbial Ecology (QIIME) (Caporaso, Kuczynski et al. 2010). Follow analytical pipelines of the 16S rRNA protocol, including removal of multiple sources of potential bias generated by 16S rRNA sequences using pyrosequencing (Kunin, Engelbrektson et al.
2010), I can describe the compositions of microbial communities, diversity, and how communities may change across time, geographical space, or varieties of
experimental treatments. All pipelines result in high comparable view of the
microbiomes between different experimental treatments (Hsiao, McBride et al. 2013, Ridaura, Faith et al. 2013, Kohl, Amaya et al. 2014, Sommer, Ståhlman et al. 2016).
High throughput-based studies allow us to reveal microbial composition. Studies based on microbial compositional structure provides all genomic information in the community and deduce the potential metabolic capacity. However, one of the
limitations of this approach is that it is hard to interpret the mechanisms of microbial structure only based on microbial compositions. Therefore, deciphering microbial interactions by using statistical algorithm or models currently provide potentials to understand the mechanisms of microbial community. It is currently attracting the interests to attempt to deduce the structure of microbial interaction networks based on 16S rRNA amplicons. The recent advances in system microbiology have been applied by the methods that characterize microbial interaction networks using longitudinal microbial studies or time-series data. Such microbiome data can be used for inferring an interaction structure to examine the static or dynamic of biological processes on a system level.
1.4.2 Deciphering microbial interaction provides insights into causal relationships
The traditional tools of microbiology to decipher microbial interactions, such as pure cultures and genetic studies, tend to study each microorganism in isolation and characterize direct interaction between different isolations by co-culturing on the
same plate. This reductionist approach provides valid information on pairwise or few interacting bacterial strains to understand microbial interactions in a small scale of community. However, it is not well suited for learning about microbial interactions in the real world based on such approach because few bacterial isolations do not reflect natural community. In addition, it is hard to mimic natural environmental conditions in the laboratory. Holistic approach, which conduct all bacteria in the community and study natural habitats directly, can yield comprehensive data to deduce more reliable microbial interactions.
Gut microbiota are complex in both compositional structure and metabolic function due to their reproduction variability, their ability to self-reproduce, and even host temporal dynamics. This complexity can be well represented by using computational modeling approach, such as network analysis. Graph-theoretical and top-down system-oriented approach can facilitate deciphering microbial interactions,
characterizing potential interacting modules as interpretations of microbial consortia, and enhance our understanding of complex evolutionary and ecological processes.
Network approach that model the co-occurrence or causal relationships among
microorganisms create insights into the applications on microbiome studies by finding microbial relationships essential for community assembly and further to deduce the potential influence of various interactions on the host health.
Deciphering microbial interactions of gut microbiota has received much attention in recent years, because the inferred interaction relationships calls for better knowledge than we have today about how microbial community structure in the intestine relates to their symbiotic functions. To engineer gut microbiota for host physiological benefits, we must understand the rules of microbial community assembly and interacting structures, i.e., the direct interactions between community members.
Microbes do not exist in isolation and are often found in a consortia of different
microbial species populations (Handelsman 2004). An assemblage of microorganisms potentially interacts with each other or with the organic chemicals from the
environments (Konopka 2009). Within a community of species sharing limited resources, interactions can be described in terms of “positive”, “negative” interaction or “no impact” determined by the type of relations between members. Therefore, deciphering microbe-microbe interactions in the gut is likely to extend the knowledge on the functions of microbial symbionts. However, the inference of microbial
interactions is far from straightforward due to the high complexity of microbial community (Faust and Raes 2012). In the following section, we attempt to review the methods on characterization of the interactions among microorganisms and describe the current gap in the network research.
1.4.3 Network inference theory
A variety of methods have been used to infer microbial interaction network based on microbiome data. These methods have different performances in accuracy, speed, efficiency, and computational requirements, and can be categorized into simple
pairwise correlation methods, such as Pearson or Spearman correlation measurements, and more complex multiple regression methods and probabilistic graphical models, e.g., Bayesian model. Some of the methods are very popular used in microbiome data due to the ease of use and better computational speed (Faust and Raes 2012), while probabilistic graphical models have not yet been applied extensively to address biological questions, although the performance of such models showed good success in other scientific fields with high accuracy and minimal bias. Here I discuss two of the different methods that popularly apply on microbiome data to infer interaction network.
The first method, un-directional co-occurrence networks, is relatively simple and required low cost of computational time and resources. They use a pairwise
dissimilarity index such as Bray–Curtis or Kullback–Leibler to infer co-occurrence
network from operational taxonomic unit (OTU) that defined from microbiome data.
The statistical significance of pairwise dis-similarity index could be evaluated by permutation tests, all significant pairwise relationships are aggregated to infer
microbial interaction network (Faust, Sathirapongsasuti et al. 2012). Faust et al. have developed a pipeline combining varieties of measurements of dependency, such as dissimilarity (e.g., Kullback–Leibler), similarity, and correlation to infer microbial interactions based on the oceanic plankton community (Lima-Mendez, Faust et al.
2015).
Correlation-based methods, which still aim for un-directional co-occurrence networks, is a popular alternative to dissimilarity-based network inference. Different from calculating dissimilarity index, correlation-based methods use correlation coefficients such as Pearson's correlation coefficient or Spearman's rank correlation coefficient to detect significant pairwise interactions between OTUs. Correlation-based methods have been popularly used in human gut microbiome (Arumugam, Raes et al. 2011).
For example, Arumugam et al. used correlation-based method to analyze combined datasets of human fecal microbiome from four different countries to conclude that there are three robust human gut enterotypes that were not nation-specific or continent-specific. However, correlation-based methods attempt to detect dependencies between OTUs in a microbiome and remains limitations, such as detecting spurious correlations among low-abundance OTUs closed to zero or being sensitive to compositionality (Chen and Li 2016). Weiss et al. (Weiss, Van Treuren et al. 2016) compared eight correlation-based methods and evaluated the performance of each method on both synthetic and real microbiome data in response to challenges by assessing their ability to identify a range of time-series ecological relationships and distinguish signals from noise. Their results provided the performance and
shortcomings of each method and concluded specific recommendations for the use of correlation-based methods.
Although correlation-based methods to infer interaction networks are fast, it is however not able to capture more complex forms of multiple microbial interactions (Faust, Sathirapongsasuti et al. 2012). Regression-based methods modify the
calculation strategy from pairwise comparison to multiple relationships by inferring the abundance of one OTU from the combined abundance of other OTUs. Regression-based methods are frequently used in biological fields; however, the regression results are usually difficult to interpret with biological meanings. For example, the
interactions between OTUs inferred by regression-based methods might not always suggest an underlie biological basis for the relationships. In addition, regression-based methods suffer from overfitting that creates in the number of false positives. These false positives caused by overfitting can be reduced by using cross-validation and sparse regression.
These computational network inference methods have been proposed to infer
microbial interactions from OTUs abundance profiles. Some of the network inference methods are designed to infer static networks by using microbial profiles that do not involve temporal aspect into consideration. The other network methods are
specifically designed for the dynamic aspects of OTUs dependencies. Microbial populations are not instantaneous but vary across time. These dynamic properties characterized from the temporal microbiome data should shed light on the causal dependencies between OTUs in the communities.
1.4.4 From static network to dynamic network: time-series data
The inference of microbial interactions based on network theories can be considered as static models of microbial communities. The static network modeling effectively provides a “snapshot” for the community status for a given time point. However, there are several phenomena, such as community dynamics, perturbation, and succession,could not be studies only based on the static models. These biological phenomena could be further studied if applying specifically designed dynamic models, which required a time series data to describe how microbial population change over time. In macro-ecological system, dynamic models have been widely used to identify the stability and development of the communities (Holling 1973, May 1974, Ives and Carpenter 2007). The recent availability of time series microbiome data generated by NGS technologies make it possible to apply dynamic models on microbial
communities and target the questions of community dynamics in micro-ecosystem.
Studies have shown successful inference of microbial interactions in the communities by analyzing high-throughput 16S rRNA sequencing in a time series course (Faust and Raes 2012). Longitudinal microbial approaches are informative because they provide microbial dynamics of communities, and the response of bacterial structure to external perturbations (Dethlefsen, Huse et al. 2008). Monitoring all bacterial
populations changes over time provides further potentials to discover the interacting mechanism between bacterial members, such as how commensal bacteria in the intestine resist the invasive pathogenic species (Stein, Bucci et al. 2013). These studies suggest the values of the longitudinal, transversal, or both measurements of microbial communities that can help to produce a set of significant dependent
relationships and build dynamic models to predict interacting mechanism of microbial communities.
A dynamic model composes of a set of Boolean functions or differential equations (Hecker, Lambeck et al. 2009), which can describes the abundance changes of bacterial members in the community over time. The Boolean function only describes bacterial present or absent, while differential equations describe absolute or relative abundance changes over time. Dynamic modeling already has a long history in single population ecology (May and Jaenike 1973), however, few approach attempted to
include multiple species to build up dynamic models for microbial communities.
Mounier et al. (Mounier, Monnet et al. 2008) included multiple microbial species in the analysis to model cheese fermentation community interactions by using
generalized Lotka-Volterra (gLV) equations. In their study, the model predicted negative interactions among three different yeast species and conducted further co-cultural experiments to confirm these negative relationships. The key feature that makes gLV model extensively used to construct microbial interactions is the model parameters could directly capture the growth rates and the pairwise interactions between all bacterial species in the community (Alshawaqfeh, Serpedin et al. 2017, Cao, Gibson et al. 2017). A variety of studies have been applied gLV models to infer microbial interaction network using 16S rRNA sequencing data over a timescale. For example, Stein et al. (Stein, Bucci et al. 2013) extended generalized Lotka-Volterra (gLV) equations to study the mechanism of C. difficile colonization in mice after antibiotic perturbation. They inferred that the genera Akkermansia, Blautia, and
Coprobacillus had inhibitory interactions on C. difficile. In contrast, Enterococcus and Mollicutes could positively affect the growth of C. difficile, while the genus
Barnesiella was predicted to inhibit growth of the genus Enterococcus. These results
demonstrate that gLV model can be applied to high-resolution time series data to infer community structure and response to external perturbations. In addition, gLV models have already been extensively applied to characterize microbial interactions in varieties of studies (Mounier, Monnet et al. 2008, Fisher and Mehta 2014, Marino, Baxter et al. 2014, Buffie, Bucci et al. 2015, Bucci, Tzen et al. 2016).1.4.5 Current pitfalls of network inference
Although gLV models are the most popular differential equations used in ecological modeling, it is still in its infancy to model microbial communities that consist of multiple species and faces varieties of challenges. For example, these models must handle a large amount of species or group of species in the real-world microbial
communities. Although there are increasing studies on deciphering microbial
interactions by applying network theories on microbial ecosystems, little attention has been paid to deal with the high complexity of microbial communities. In addition, there is still a huge gap between computational theories and applications due to the lack of reliable validation systems. The major reason is the lack of benchmark metagenomics datasets (Berry and Widder 2014). For example, the gLV models are widely applied to interpret varieties of biological questions of interest via inferring microbial interactions (Mounier, Monnet et al. 2008, Stein, Bucci et al. 2013), however, the validations of advancing analytical algorithms and pipelines must reply on simulated data. It is therefore that network inference approaches remain a gap between analytical theories and biological applications.