Jaron Thompson scite author profile

¹

,

Johansen

²

,

Dunbar

³

et al. 2019

Microbial communities are ubiquitous and often influence macroscopic properties of the ecosystems they inhabit. However, deciphering the functional relationship between specific microbes and ecosystem properties is an ongoing challenge owing to the complexity of the communities. This challenge can be addressed, in part, by integrating the advances in DNA sequencing technology with computational approaches like machine learning. Although machine learning techniques have been applied to microbiome data, use of these techniques remains rare, and user-friendly platforms to implement such techniques are not widely available. We developed a tool that implements neural network and random forest models to perform regression and feature selection tasks on microbiome data. In this study, we applied the tool to analyze soil microbiome (16S rRNA gene profiles) and dissolved organic carbon (DOC) data from a 44-day plant litter decomposition experiment. The microbiome data includes 1709 total bacterial operational taxonomic units (OTU) from 300+ microcosms. Regression analysis of predicted and actual DOC for a held-out test set of 51 samples yield Pearson’s correlation coefficients of.636 and.676 for neural network and random forest approaches, respectively. Important taxa identified by the machine learning techniques are compared to results from a standard tool (indicator species analysis) widely used by microbial ecologists. Of 1709 bacterial taxa, indicator species analysis identified 285 taxa as significant determinants of DOC concentration. Of the top 285 ranked features determined by machine learning methods, a subset of 86 taxa are common to all feature selection techniques. Using this subset of features, prediction results for random permutations of the data set are at least equally accurate compared to predictions determined using the entire feature set. Our results suggest that integration of multiple methods can aid identification of a robust subset of taxa within complex communities that may drive specific functional outcomes of interest.

Recurrent neural networks enable design of multifunctional synthetic human gut microbiome dynamics

Baranwal

¹

,

Clark

²

,

³

et al. 2022

Predicting the dynamics and functions of microbiomes constructed from the bottom-up is a key challenge in exploiting them to our benefit. Current models based on ecological theory fail to capture complex community behaviors due to higher order interactions, do not scale well with increasing complexity and in considering multiple functions. We develop and apply a long short-term memory (LSTM) framework to advance our understanding of community assembly and health-relevant metabolite production using a synthetic human gut community. A mainstay of recurrent neural networks, the LSTM learns a high dimensional data-driven non-linear dynamical system model. We show that the LSTM model can outperform the widely used generalized Lotka-Volterra model based on ecological theory. We build methods to decipher microbe-microbe and microbe-metabolite interactions from an otherwise black-box model. These methods highlight that Actinobacteria, Firmicutes and Proteobacteria are significant drivers of metabolite production whereas Bacteroides shape community dynamics. We use the LSTM model to navigate a large multidimensional functional landscape to design communities with unique health-relevant metabolite profiles and temporal behaviors. In sum, the accuracy of the LSTM model can be exploited for experimental planning and to guide the design of synthetic microbiomes with target dynamic functions.

Soil Bacterial and Fungal Richness Forecast Patterns of Early Pine Litter Decomposition

Albright

¹

,

Johansen

²

,

³

et al. 2020

Discovering widespread microbial processes that drive unexpected variation in carbon cycling may improve modeling and management of soil carbon (Prescott, 2010; Wieder et al., 2015a, 2018). A first step is to identify community features linked to carbon cycle variation. We addressed this challenge using an epidemiological approach with 206 soil communities decomposing Ponderosa pine litter in 618 microcosms. Carbon flow from litter decomposition was measured over a 6-week incubation. Cumulative CO 2 from microbial respiration varied twofold among microcosms and dissolved organic carbon (DOC) from litter decomposition varied five-fold, demonstrating large functional variation despite constant environmental conditions where strong selection is expected. To investigate microbial features driving DOC concentration, two microbial community cohorts were delineated as "high" and "low" DOC. For each cohort, communities from the original soils and from the final microcosm communities after the 6-week incubation with litter were taxonomically profiled. A logistic model including total biomass, fungal richness, and bacterial richness measured in the original soils or in the final microcosm communities predicted the DOC cohort with 72 (P < 0.05) and 80 (P < 0.001) percent accuracy, respectively. The strongest predictors of the DOC cohort were biomass and either fungal richness (in the original soils) or bacterial richness (in the final microcosm communities). Successful forecasting of functional patterns after lengthy community succession in a new environment reveals strong historical contingencies. Forecasting future community function is a key advance beyond correlation of functional variance with end-state community features. The importance of taxon richness-the same feature linked to carbon fate in gut microbiome studies-underscores the need for increased understanding of biotic mechanisms that can shape richness in microbial communities independent of physicochemical conditions.

Differences in substrate use linked to divergent carbon flow during litter decomposition

Albright

¹

,

²

,

Kroeger

³

et al. 2020

Abstract Discovering widespread microbial processes that create variation in soil carbon (C) cycling within ecosystems may improve soil C modeling. Toward this end, we screened 206 soil communities decomposing plant litter in a common garden microcosm environment and examined features linked to divergent patterns of C flow. C flow was measured as carbon dioxide (CO2) and dissolved organic carbon (DOC) from 44-days of litter decomposition. Two large groups of microbial communities representing ‘high’ and ‘low’ DOC phenotypes from original soil and 44-day microcosm samples were down-selected for fungal and bacterial profiling. Metatranscriptomes were also sequenced from a smaller subset of communities in each group. The two groups exhibited differences in average rate of CO2 production, demonstrating that the divergent patterns of C flow arose from innate functional constraints on C metabolism, not a time-dependent artefact. To infer functional constraints, we identified features—traits at the organism, pathway, or gene level—linked to the high and low DOC phenotypes using RNA-Seq approaches and machine learning approaches. Substrate use differed across the high and low DOC phenotypes. Additional features suggested that divergent patterns of C flow may be driven in part by differences in organism interactions that affect DOC abundance directly or indirectly by controlling community structure.

Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition

¹

,

Johansen

²

,

Dunbar

³

et al. 2019

Preprint

Microbial communities are ubiquitous and often influence macroscopic properties of the ecosystems they inhabit. However, deciphering the functional relationship between specific microbes and ecosystem properties is an ongoing challenge owing to the complexity of the communities. This challenge can be addressed, in part, by integrating the advances in DNA sequencing technology with computational approaches like machine learning. Although machine learning techniques have been applied to microbiome data, use of these techniques remains rare, and user-friendly platforms to implement such techniques are not widely available. We developed a tool that implements neural network and random forest models to perform regression and feature selection tasks on microbiome data. In this study, we applied the tool to analyze soil microbiome (16S rRNA gene profiles) and dissolved organic carbon (DOC) data from a 44-day plant litter decomposition experiment. The microbiome data includes 1709 total June 4, 2019 1/23 bacterial operational taxonomic units (OTU) from 300+ microcosms. Regression analysis of predicted and actual DOC for a held-out test set of 51 samples yield Pearson's correlation coefficients of .636 and .676 for neural network and random forest approaches, respectively. Important taxa identified by the machine learning techniques are compared to results from a standard tool (indicator species analysis) widely used by microbial ecologists. Of 1709 bacterial taxa, indicator species analysis identified 285 taxa as significant determinants of DOC concentration. Of the top 285 ranked features determined by machine learning methods, a subset of 86 taxa are common to all feature selection techniques. Using this subset of features, prediction results for random permutations of the data set are at least equally accurate compared to predictions determined using the entire feature set. Our results suggest that integration of multiple methods can aid identification of a robust subset of taxa within complex communities that may drive specific functional outcomes of interest. Introduction 1 Microbial communities mediate essential functions in diverse ecosystems. While the 2 microbiome controls many interesting macroscopic properties, elucidating the 3 relationship between specific microbes and ecosystem functions remains a complex 4 problem in ecology. Recent advances in DNA sequencing technology make it easy to 5 acquire metagenomic data representing the taxonomic profile of bacteria and fungi in 6 microbial communities. This opens the door to deciphering which components of the 7 microbiome can drive changes in macroscopic properties. However, analysis of 8 metagenomic microbial data poses several difficulties. The data are typically high 9 dimensional (many taxa) with a small number of samples collected in each study. 10 Additionally, sequencing results are noisy and yield sparse data sets [1]. 11 Machine learning techniques provide a means to analyze high-dimensional data [2, 3] 12 and could be used to elucidate relationships be...