Different organisms in a microbial community may drastically affect each other’s growth phenotypes, significantly affecting the community dynamics, with important implications for human and environmental health. Novel culturing methods and the decreasing costs of sequencing will gradually enable high-throughput measurements of pairwise interactions in systematic coculturing studies. However, a thorough characterization of all interactions that occur within a microbial community is greatly limited both by the combinatorial complexity of possible assortments and by the limited biological insight that interaction measurements typically provide without laborious specific follow-ups. Here, we show how a simple and flexible formal representation of microbial pairs can be used for the classification of interactions via machine learning. The approach we propose predicts with high accuracy the outcome of yet-to-be performed experiments and generates testable hypotheses about the mechanisms of specific interactions.
The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/286641 doi: bioRxiv preprint first posted online Mar. 21, 2018; 24 Abstract 25Microbes affect each other's growth in multiple, often elusive ways. The ensuing 26 interdependencies form complex networks, believed to influence taxonomic composition, 27 as well as community-level functional properties and dynamics. Elucidation of these 28 networks is often pursued by measuring pairwise interaction in co-culture experiments. 29However, combinatorial complexity precludes the exhaustive experimental analysis of 30 pairwise interactions even for moderately sized microbial communities. Here, we use a 31 machine-learning random forest approach to address this challenge. In particular, we show 32 how partial knowledge of a microbial interaction network, combined with trait-level 33 representations of individual microbial species, can provide accurate inference of missing 34 edges in the network and putative mechanisms underlying interactions. We applied our 35 algorithm to two case studies: an experimentally mapped network of interactions between 36 auxotrophic E. coli strains, and a large in silico network of metabolic interdependencies 37 between 100 human gut-associated bacteria. For this last case, 5% of the network is 38 enough to predict the remaining 95% with 80% accuracy, and mechanistic hypotheses 39 produced by the algorithm accurately reflect known metabolic exchanges. Our approach, 40 broadly applicable to any microbial or other ecological network, can drive the discovery 41 of new interactions and new molecular mechanisms, both for therapeutic interventions 42 involving natural communities and for the rational design of synthetic consortia.
1Machine learning is helping the interpretation of biological complexity by enabling the 2 inference and classification of cellular, organismal and ecological phenotypes based on 3 large datasets, e.g. from genomic, transcriptomic and metagenomic analyses. A number 4 of available algorithms can help search these datasets to uncover patterns associated with 5 specific traits, including disease-related attributes. While, in many instances, treating an 6 algorithm as a black box is sufficient, it is interesting to pursue an enhanced 7 understanding of how system variables end up contributing to a specific output, as an 8 avenue towards new mechanistic insight. Here we address this challenge through a suite 9 of algorithms, named BowSaw, which takes advantage of the structure of a trained 10 random forest algorithm to identify combinations of variables ("rules") frequently used 11 for classification. We first apply BowSaw to a simulated dataset, and show that the 12 algorithm can accurately recover the sets of variables used to generate the phenotypes 13 through complex Boolean rules, even under challenging noise levels. We next apply our 14 method to data from the integrative Human Microbiome Project and find previously 15 unreported high-order combinations of microbial taxa putatively associated with Crohn's 16 disease. By leveraging the structure of trees within a random forest, BowSaw provides a 17 new way of using decision trees to generate testable biological hypotheses. 18 19 20 21 22
Machine learning is helping the interpretation of biological complexity by enabling the inference and classification of cellular, organismal and ecological phenotypes based on large datasets, e.g., from genomic, transcriptomic and metagenomic analyses. A number of available algorithms can help search these datasets to uncover patterns associated with specific traits, including disease-related attributes. While, in many instances, treating an algorithm as a black box is sufficient, it is interesting to pursue an enhanced understanding of how system variables end up contributing to a specific output, as an avenue toward new mechanistic insight. Here we address this challenge through a suite of algorithms, named BowSaw, which takes advantage of the structure of a trained random forest algorithm to identify combinations of variables (“rules”) frequently used for classification. We first apply BowSaw to a simulated dataset and show that the algorithm can accurately recover the sets of variables used to generate the phenotypes through complex Boolean rules, even under challenging noise levels. We next apply our method to data from the integrative Human Microbiome Project and find previously unreported high-order combinations of microbial taxa putatively associated with Crohn’s disease. By leveraging the structure of trees within a random forest, BowSaw provides a new way of using decision trees to generate testable biological hypotheses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.