DNA metabarcoding is becoming the tool of choice for biodiversity assessment across taxa and environments. Yet, the artefacts present in metabarcoding datasets often preclude a proper interpretation of ecological patterns. Bioinformatic pipelines to remove experimental noise exist. However, these often only partially target produced artefacts, or are marker specific. In addition, assessments of data curation quality and chosen filtering thresholds are seldom available in existing pipelines, partly due to the lack of appropriate visualisation tools.
Here, we present metabaR, an r package that provides a comprehensive suite of tools to effectively curate DNA metabarcoding data after basic bioinformatic analyses. In particular, metabaR uses experimental negative or positive controls to identify different types of artefactual sequences, that is, contaminants and tag‐jumps. It also flags potentially dysfunctional PCRs based on PCR replicate similarities when those are available. Finally, metabaR provides tools to visualise DNA metabarcoding data characteristics in their experimental context as well as their distribution, and facilitates assessment of the appropriateness of data curation filtering thresholds.
metabaR is applicable to any DNA metabarcoding experimental design but is most powerful when the design includes experimental controls and replicates. More generally, the simplicity and flexibility of the package makes it applicable any DNA marker, and data generated with any sequencing platform, and pre‐analysed with any bioinformatic pipeline. Its outputs are easily usable for downstream analyses with any ecological r package.
metabaR complements existing bioinformatics pipelines by providing scientists with a variety of functions to effectively clean DNA metabarcoding data and avoid serious misinterpretations. It thus offers a promising platform for automatised data quality assessments of DNA metabarcoding data for environmental research and biomonitoring.