Current mitochondrial DNA (mtDNA) haplogroup classification tools map reads to a single reference genome and perform inference based on the detected mutations to this reference. This approach biases haplogroup assignments towards the reference and prohibits accurate calculations of the uncertainty in assignment. We present HaploCart, an mtDNA haplogroup classifier which uses VG's pangenomic reference graph framework together with principles of Bayesian inference. We demonstrate that our approach significantly outperforms available tools by being more robust to lower coverage or incomplete consensus sequences and producing phylogenetically-aware confidence scores that are unbiased towards any haplogroup. HaploCart is available both as a command-line tool and through a user-friendly web interface. The program written in C++ accepts as input consensus FASTA, FASTQ, or GAM files, and outputs a text file with the haplogroup assignments along confidence estimates. Our work considerably reduces the amount of data required to obtain a confident mitochondrial haplogroup assignment. HaploCart is available- as a command-line tool at https://github.com/grenaud/vgan and as a web server at https://services.healthtech.dtu.dk/service.php?HaploCart.
Current mitochondrial DNA (mtDNA) haplogroup classification tools map reads to a single reference genome and perform inference based on the detected mutations to this reference. This approach biases haplogroup assignments towards the reference and prohibits accurate calculations of the uncertainty in assignment. We present HaploCart, a probabilistic mtDNA haplogroup classifier which uses a pangenomic reference graph framework together with principles of Bayesian inference. We demonstrate that our approach significantly outperforms available tools by being more robust to lower coverage or incomplete consensus sequences and producing phylogenetically-aware confidence scores that are unbiased towards any haplogroup. HaploCart is available both as a command-line tool and through a user-friendly web interface. The C++ program accepts as input consensus FASTA, FASTQ, or GAM files, and outputs a text file with the haplogroup assignments of the samples along with the level of confidence in the assignments. Our work considerably reduces the amount of data required to obtain a confident mitochondrial haplogroup assignment.
1. Ancient environmental DNA (eDNA) is a crucial source of information for past environmental reconstruction. However, the computational analysis of ancient eDNA involves not only the inherited challenges of ancient DNA (aDNA) but also the typical difficulties of eDNA samples, such as taxonomic identification and abundance estimation of identified taxonomic groups. Current methods for ancient eDNA fall into those that only perform mapping followed by taxonomic identification and those that purport to do abundance estimation. The former leaves abundance estimates to users, while methods for the latter are not designed for large metagenomic datasets and are often imprecise and challenging to use. 2. Here, we introduce euka, a tool designed for rapid and accurate characterisation of ancient eDNA samples. We use a taxonomy-based pangenome graph of reference genomes for robustly assigning DNA sequences and use a maximum-likelihood framework for abundance estimation. At the present time, our database is restricted to mitochondrial genomes of tetrapods and arthropods but can be expanded in future versions. 3. We find euka to outperform current taxonomic profiling tools as well as their abundance estimates. Crucially, we show that regardless of the filtering threshold set by existing methods, euka demonstrates higher accuracy. Furthermore, our approach is robust to sparse data, which is idiosyncratic of ancient eDNA, detecting a taxon with an average of fifty reads aligning. We also show that euka is consistent with competing tools on empirical samples and about ten times faster than current quantification tools. 4. euka's features are fine-tuned to deal with the challenges of ancient eDNA, making it a simple-to-use, all-in-one tool. It is available on GitHub: https://github.com/grenaud/vgan. euka enables researchers to quickly assess and characterise their sample, thus allowing it to be used as a routine screening tool for ancient eDNA.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.