BackgroundIn light of the current biodiversity crisis, DNA barcoding is developing into an essential tool to quantify state shifts in global ecosystems. Current barcoding protocols often rely on short amplicon sequences, which yield accurate identification of biological entities in a community, but provide limited phylogenetic resolution across broad taxonomic scales. However, the phylogenetic structure of communities is an essential component of biodiversity. Consequently, a barcoding approach is required that unites robust taxonomic assignment power and high phylogenetic utility. A possible solution is offered by sequencing long ribosomal DNA (rDNA) amplicons on the MinION platform (Oxford Nanopore Technologies).
ResultsUsing a dataset of various animal and plant species, with a focus on arthropods, we assemble a pipeline for long rDNA barcode analysis and introduce a new software (MiniBar) to demultiplex dual indexed nanopore reads. We find excellent phylogenetic and taxonomic resolution offered by long rDNA sequences across broad taxonomic scales. We highlight the simplicity of our approach by field barcoding with a miniaturized, mobile laboratory in a remote rainforest. We also test the utility of long rDNA amplicons for analysis of community diversity through metabarcoding and find that they recover highly skewed diversity estimates.
ConclusionsSequencing dual indexed, long rDNA amplicons on the MinION platform is a straightforward, cost effective, portable and universal approach for eukaryote DNA barcoding. Long rDNA amplicons scale up DNA barcoding by enabling the accurate recovery of taxonomic and phylogenetic diversity. However, bulk community analyses using long-read approaches may introduce biases and will require further exploration.
MiniBarWe created a de-multiplexing software, called MiniBar. It allows customization of search parameters to account for the high read error rates and has built-in awareness of the dual barcode and primer pairs flanking the sequences. MiniBar takes as input a tab-delimited barcode file and a sequence file in either fasta or fastq format. The barcode file contains, at a minimum, sample name, forward barcode, forward primer, reverse barcode, and reverse primer for each of the samples potentially in the sequence file. The software searches for barcodes and for a primer, each permitting a user defined number of errors, an error being a mismatch or indel. Error count to determine a match can either be a percentage of each of their lengths or can be separately specified for barcode and primer as a maximum edit distance [49]. Output options permit saving each sample in its own file or all samples in a single file, with the sample names in the fasta or fastq headers. The found barcode primer pairs can be trimmed from the sequence or can remain in the sequence distinguished by case or color. MiniBar, written in Python 2.7, can also run in Python 3 and has the single dependency of the Edlib library module for edit distance measured approximate search [50]. MiniBar can be found at...