2019
DOI: 10.1093/biomet/asz062
|View full text |Cite
|
Sign up to set email alerts
|

Multisample estimation of bacterial composition matrices in metagenomics data

Abstract: Metagenomics sequencing is routinely applied to quantify bacterial abundances in microbiome studies, where the bacterial composition is estimated based on the sequencing read counts. Due to limited sequencing depth and DNA dropouts, many rare bacterial taxa might not be captured in the final sequencing reads, which results in many zero counts. Naive composition estimation using count normalization leads to many zero proportions, which tend to result in inaccurate estimates of bacterial abundance and diversity.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
49
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 29 publications
(49 citation statements)
references
References 45 publications
0
49
0
Order By: Relevance
“…Our computational data analysis workflow, available on GitHub and as Synapse project (see the ‘Data Availability’ section), is fully reproducible, provides all novel shrinkage estimators introduced here and allows easy extension and comparison to additional data normalization, estimation and downstream analysis tasks. For instance, future work could include the integration of more advanced zero-replacement strategies ( 54 , 55 ), application of popular data normalization schemes from single-cell data analysis ( 56 ) or the application of other correlation ( 21 , 48 ) or proportionality estimators, including those available in the propr package ( 23 ). Here, rather than using universal thresholding for sparsifying associations, more advanced selection strategies that control false discovery rates [as available in the propr package ( 23 )] may improve the sample size consistency of the microbial association inference workflows.…”
Section: Discussionmentioning
confidence: 99%
“…Our computational data analysis workflow, available on GitHub and as Synapse project (see the ‘Data Availability’ section), is fully reproducible, provides all novel shrinkage estimators introduced here and allows easy extension and comparison to additional data normalization, estimation and downstream analysis tasks. For instance, future work could include the integration of more advanced zero-replacement strategies ( 54 , 55 ), application of popular data normalization schemes from single-cell data analysis ( 56 ) or the application of other correlation ( 21 , 48 ) or proportionality estimators, including those available in the propr package ( 23 ). Here, rather than using universal thresholding for sparsifying associations, more advanced selection strategies that control false discovery rates [as available in the propr package ( 23 )] may improve the sample size consistency of the microbial association inference workflows.…”
Section: Discussionmentioning
confidence: 99%
“…Our computational data analysis workflow, available on GitHub and as synapse project (see Data Availability), is fully reproducible, provides all novel shrinkage estimators introduced here, and allows easy extension and comparison to additional data normalization, estimation, and downstream analysis tasks. For instance, future work could include the integration of more advanced zero-replacement strategies (51,52), application of popular data normalization schemes from single-cell data analysis (53) or the application of other correlation (21,46) or proportionality estimators, including those available in the propr package (23). Here, rather than using universal thresholding for sparsifying associations, more advanced selection strategies that control false discovery rates (as available in the propr package (23)) may improve the consistency of the microbial association inference workflows.…”
Section: Discussionmentioning
confidence: 99%
“…Dimension reduction methods optimized for count data that apply a better-fitting likelihood model (e.g., Poisson or negative binomial) are promising for addressing the skewed distribution of sc count data (8,14). However, glmPCA (8), Poisson factorization (34)(35)(36), and probabilistic count matrix factorization [pCMF, (37)], as well as methods designed to model zero-inflated sparse data, including ZIFA and ZINB-WaVE (38,39) did not outperform PCA across the full range of analyses and evaluations performed in the study Sun et al (30). While there are particular settings where these methods may be most appropriate, they are not necessarily appropriate as "generalpurpose" approaches.…”
Section: Dimension Reductionmentioning
confidence: 99%