Background: Amplicon sequencing of phylogenetic marker genes, e.g. 16S, 18S or ITS rRNA sequences, is still the most commonly used method to estimate the structure of microbial communities. Microbial ecologists often have expert knowledge on their biological question and data analysis in general, and most research institutes have computational infrastructures to employ the bioinformatics command line tools and workflows for amplicon sequencing analysis, but requirements of bioinformatics skills often limit the efficient and up-to-date use of computational resources. Results: dadasnake wraps pre-processing of sequencing reads, delineation of exact sequencing variants using the favorably benchmarked, widely-used the DADA2 algorithm, taxonomic classification and post-processing of the resultant tables, and hand-off in standard formats, into a userfriendly, one-command Snakemake pipeline. The suitability of the provided default configurations is demonstrated using mock-community data from bacteria and archaea, as well as fungi. Conclusions: By use of Snakemake, dadasnake makes efficient use of high-performance computing infrastructures. Easy user configuration guarantees flexibility of all steps, including the processing of data from multiple sequencing platforms. dadasnake facilitates easy installation via conda environments. dadasnake is available at https://github.com/a-h-b/dadasnake .
Findings BackgroundSince the first reports 15 years ago [1], high-throughput amplicon sequencing has become the most common approach to monitor microbial diversity in environmental samples. Sequencing preparation, throughput and precision have been consistently improved, while costs have decreased. Computational methods have been refined in the recent years, especially with the shift to exact sequencing variants and better use of sequence quality data [2,3]. While amplicon sequencing can have severe limitations, such as limited and uneven taxonomic resolution [4,5], over-and underestimation of diversity [6,7], lack of quantitative value [8,9] and missing functional information, amplicon sequencing is still considered the method of choice to gain an overview of microbial diversity in a large number of samples [10,11]. Consequently, the sizes of typical amplicon sequencing datasets have grown. In addition, synthesis efforts are undertaken, requiring efficient processing pipelines for amplicon sequencing data [12]. Due to the unique, microbiome-specific characteristics of each dataset and the need to integrate the community structure data with other data types, such as abiotic or biotic parameters, users of data processing tools need to have expert knowledge on their biological question and statistics. It is therefore desirable that workflows should be as user-friendly as possible. Several widely used workflows exist e.g. qiime2 [13], mothur [14], usearch [15], lOTUs [16], with new approaches continually being developed, e.g. OCToPUS [17], PEMA [18], typically balancing learning curves, configurability and efficiency.Purpose of dadasna...