2019
DOI: 10.1101/711317
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Dirichlet-multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data

Abstract: Molecular ecology regularly requires the analysis of count data that reflect the relative abundance of features of a composition (e.g., taxa in a community, gene transcripts in a tissue).The sampling process that generates these data can be modeled using the multinomial distribution. Replicate multinomial samples inform the relative abundances of features in an underlying Dirichlet distribution. These distributions together form a hierarchical model for relative abundances among replicates and sampling groups.… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4

Relationship

3
5

Authors

Journals

citations
Cited by 9 publications
(12 citation statements)
references
References 79 publications
0
12
0
Order By: Relevance
“…We note that if extreme sequencing depth is employed, such as what can be obtained through the Illumina NovaSeq platform, it may be possible to use much less ISD and still achieve satisfactory results. We also suggest that a modeling approach to estimate proportions from count data for all sequenced features should allow much lower input of ISD than would estimation of proportions following rarefaction, because accurate estimates of proportions can be modeled given few observations (Harrison, Calder, Shastry, & Buerkle, 2020). Also, we note that if a cellular ISD is used for metabarcoding studies it is wise to consider the CNV of the focal loci when performing concentration calculations prior to ISD addition (see Stämmler et al., 2016).…”
Section: How Much Internal Standard Should Be Included In Samples?mentioning
confidence: 99%
See 1 more Smart Citation
“…We note that if extreme sequencing depth is employed, such as what can be obtained through the Illumina NovaSeq platform, it may be possible to use much less ISD and still achieve satisfactory results. We also suggest that a modeling approach to estimate proportions from count data for all sequenced features should allow much lower input of ISD than would estimation of proportions following rarefaction, because accurate estimates of proportions can be modeled given few observations (Harrison, Calder, Shastry, & Buerkle, 2020). Also, we note that if a cellular ISD is used for metabarcoding studies it is wise to consider the CNV of the focal loci when performing concentration calculations prior to ISD addition (see Stämmler et al., 2016).…”
Section: How Much Internal Standard Should Be Included In Samples?mentioning
confidence: 99%
“…The bulk of the variation they observed was assigned to technical causes. We suggest that Bayesian models are an exciting possibility for partitioning variation in sequence data, in part because they make full use of the data and can incorporate hierarchical model structures to share information among all replicates within a sampling group (sensu Fordyce, Gompert, Forister, & Nice, 2011; Harrison et al., 2020). This is in contrast to rarefaction methods, which discard observed data and thus provide potentially misleading information about technical and biological variation among samples (McMurdie & Holmes, 2014).…”
Section: Internal Standards Are Not a Panacea For All The Ills Of Seqmentioning
confidence: 99%
“…We analyzed sequence count data via a hierarchical Bayesian modeling (HBM) framework that provides estimates of proportional relative abundance for each microbial taxon [67,68].…”
Section: Discussionmentioning
confidence: 99%
“…All scripts and processed data used for this manuscript are available at https://github.com/JHarrisonEcoEvo/DMM Harrison, Calder, Shastry, & Buerkle, and a snapshot corresponding to the status at publication at Zenodo (10.5281/zenodo.3558682). Data from Duvallet et al () can be downloaded from (https://doi.org/10.5281/zenodo.2678108).…”
Section: Data Availability Statementmentioning
confidence: 99%