The accurate extraction of species‐abundance information from DNA‐based data (metabarcoding, metagenomics) could contribute usefully to diet analysis and food‐web reconstruction, the inference of species interactions, the modelling of population dynamics and species distributions, the biomonitoring of environmental state and change, and the inference of false positives and negatives. However, multiple sources of bias and noise in sampling and processing combine to inject error into DNA‐based data sets. To understand how to extract abundance information, it is useful to distinguish two concepts. (i) Within‐sample across‐species quantification describes relative species abundances in one sample. (ii) Across‐sample within‐species quantification describes how the abundance of each individual species varies from sample to sample, such as over a time series, an environmental gradient or different experimental treatments. First, we review the literature on methods to recover across‐species abundance information (by removing what we call “species pipeline biases”) and within‐species abundance information (by removing what we call “pipeline noise”). We argue that many ecological questions can be answered with just within‐species quantification, and we therefore demonstrate how to use a “DNA spike‐in” to correct for pipeline noise and recover within‐species abundance information. We also introduce a model‐based estimator that can be used on data sets without a physical spike‐in to approximate and correct for pipeline noise.
The accurate extraction of species-abundance information from DNA-based data (metabarcoding, metagenomics) could contribute usefully to diet reconstruction and quantitative food webs, the inference of species interactions, the modelling of population dynamics and species distributions, the biomonitoring of environmental state and change, and the inference of false positives and negatives. However, capture bias, capture noise, species pipeline biases, and pipeline noise all combine to inject error into DNA-based datasets. We focus on methods for correcting the latter two error sources, as the first two are addressed extensively in the ecological literature. To extract abundance information, it is useful to distinguish two concepts. (1) Across-species quantification describes relative species abundances within one sample. (2) In contrast, within-species quantification describes how the abundance of each individual species varies from sample to sample, as in a time series, an environmental gradient, or different experimental treatments. Firstly, we review methods to remove species pipeline biases and pipeline noise. Secondly, we demonstrate experimentally (with a detailed protocol) how to use a 'DNA spike-in' to remove pipeline noise and recover within-species abundance information. We also introduce a statistical estimator that can partially remove pipeline noise from datasets that lack a physical DNA spike-in.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.