2020
DOI: 10.1016/j.csbj.2020.07.020
|View full text |Cite
|
Sign up to set email alerts
|

Method development for cross-study microbiome data mining: Challenges and opportunities

Abstract: During the past decade, tremendous amount of microbiome sequencing data has been generated to study on the dynamic associations between microbial profiles and environments. How to precisely and efficiently decipher large-scale of microbiome data and furtherly take advantages from it has become one of the most essential bottlenecks for microbiome research at present. In this mini-review, we focus on the three key steps of analyzing cross-study microbiome datasets, including microbiome profiling, data integratin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 27 publications
(19 citation statements)
references
References 103 publications
0
19
0
Order By: Relevance
“…In this work, we introduce Microbiome Search Engine 2 (MSE 2), which features (i) an expanded database of over 250,000 shotgun metagenomic and 16S rRNA gene amplicon samples associated with unified metadata collected from 798 studies and (ii) an enhanced search engine for real-time and fast (<0.5 s per query) searches for best-matched microbiomes via not just taxonomic but also functional profiles. The value of a search-based strategy has been demonstrated for defining the novelty of microbiome samples ( 21 ) and for cross-cohort disease diagnosis ( 22 , 24 ). By adding a function-based dimension for these and related applications, MSE 2 should accelerate large-scale mining of the ever-expanding metagenome data space.…”
Section: Discussionmentioning
confidence: 99%
“…In this work, we introduce Microbiome Search Engine 2 (MSE 2), which features (i) an expanded database of over 250,000 shotgun metagenomic and 16S rRNA gene amplicon samples associated with unified metadata collected from 798 studies and (ii) an enhanced search engine for real-time and fast (<0.5 s per query) searches for best-matched microbiomes via not just taxonomic but also functional profiles. The value of a search-based strategy has been demonstrated for defining the novelty of microbiome samples ( 21 ) and for cross-cohort disease diagnosis ( 22 , 24 ). By adding a function-based dimension for these and related applications, MSE 2 should accelerate large-scale mining of the ever-expanding metagenome data space.…”
Section: Discussionmentioning
confidence: 99%
“…Microbiome-based disease detection can be considered as a classification problem using microbial profiles, which are parsed from DNA sequences by bioinformatics tools such as UPARSE [5] , QIIME/QIIME2 [29] , [30] , Parallel-Meta3 [31] , MetaPhlAn2 [6] , HUMANn2 [7] , Kraken [32] , according to the sequencing method and type [3] . Given microbiome profiles for n samples ( is the microbial profile of a sample that can be represented by normalized richness of features like species, OTU, function, etc.)…”
Section: Single-label Classification In Microbiome Studiesmentioning
confidence: 99%
“…Microbiome analysis characterizes the dynamics of complex microbial communities, thus provides opportunities to investigate the associations between microbial profiles and human diseases [1] , [2] , [3] . Recently years, the scale of publicly-available microbiome data is increasing intensively due to high-throughput sequencing.…”
Section: Introductionmentioning
confidence: 99%
“…For those tools that do exist, it can be difficult to know which to select, as some will be generalizable to all data types and experiments, while others will depend on the particular questions under investigation [ 31 ]. Generally, there is much more software tailored for multi-omic analysis of either the host or microbiome in isolation [ 36 , 74 78 ], as opposed to tools for integrating datasets from both simultaneously [ 79 , 80 ]. For instance, gNOMO is a bioinformatic pipeline that is specifically designed to process and analyze non-model organism samples of up to three meta-omics levels—metagenomics, metatranscriptomics, and metaproteomics—in an integrative manner [ 81 ], but analysis does not extend to the host.…”
Section: Introductionmentioning
confidence: 99%