2016
DOI: 10.1371/journal.pcbi.1004977
|View full text |Cite
|
Sign up to set email alerts
|

Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights

Abstract: Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

15
567
2
3

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 512 publications
(587 citation statements)
references
References 64 publications
15
567
2
3
Order By: Relevance
“…The result demonstrated that microbiome composition indeed could be useful for crop productivity prediction. While the prediction model with a small training set resulted in lower accuracy compare to machine learning prediction in human diseases (generally included 100–300 samples; Pasolli et al, 2016), we expect the accuracy would be improved with a larger sample size. Nonetheless, because soil microbiome could be far more complicated and diverse than human microbiome, limited sequencing depth to detect rare taxa and the reproducibility under the challenge of highly varied environmental factors will be the technical bottlenecks.…”
Section: Discussionmentioning
confidence: 89%
“…The result demonstrated that microbiome composition indeed could be useful for crop productivity prediction. While the prediction model with a small training set resulted in lower accuracy compare to machine learning prediction in human diseases (generally included 100–300 samples; Pasolli et al, 2016), we expect the accuracy would be improved with a larger sample size. Nonetheless, because soil microbiome could be far more complicated and diverse than human microbiome, limited sequencing depth to detect rare taxa and the reproducibility under the challenge of highly varied environmental factors will be the technical bottlenecks.…”
Section: Discussionmentioning
confidence: 89%
“…We next applied StrainPhlAn to a set of 1590 gut metagenomes from adult subjects retrieved from nine public data sets ( Table 1) that we preprocessed using uniform quality control criteria (Methods) as in Pasolli et al (2016). The resulting population spanned all continents except Australia and Antarctica, with curated common metadata including country of origin, health or disease state, age, and BMI (other metadata was either not provided or not common among data sets).…”
Section: Resultsmentioning
confidence: 99%
“…Comparatively, strain level profiles, often containing hundreds of thousands of gene markers' information, should be more informative for accurately classifying the samples into patient and healthy control groups across different types of diseases than abundance profiles that usually contain only a few hundred bacteria's abundance information [11]. Lastly, to evaluate and compare the performance of machine learning models, it is necessary to introduce a rigorous validation framework to estimate their performance over unseen data.…”
Section: Introductionmentioning
confidence: 99%
“…Lastly, to evaluate and compare the performance of machine learning models, it is necessary to introduce a rigorous validation framework to estimate their performance over unseen data. Pasolli et al utilized a 10-fold cross-validation scheme that optimizes hyper-parameters of their model using a test fold as well as 9 training folds and selects the best result as its performance [11]. This approach may overestimate model performance as it allows to tune the model against test data [12,13].…”
Section: Introductionmentioning
confidence: 99%