Background
Y-chromosome DNA (Y-DNA) has been used for tracing paternal lineages and offers a clear path from an individual to a known, or likely, direct paternal ancestor. The advance of next-generation sequencing (NGS) technologies increasingly improves the resolution of the non-recombining region of the Y-chromosome (NRY). However, a lack of suitable computer tools prevents the use of NGS data from the Y-DNA studies.
Results
We developed Y-LineageTracker, a high-throughput analysis framework that not only utilizes state-of-the-art methodologies to automatically determine NRY haplogroups and identify microsatellite variants of Y-chromosome on a fine scale, but also optimizes comprehensive Y-DNA analysis methods for NGS data. Notably, Y-LineageTracker integrates the NRY haplogroup and Y-STR analysis modules with recognized strategies to robustly suggest an interpretation for paternal genetics and evolution. NRY haplogroup module mainly covers haplogroup classification, clustering analysis, phylogeny construction, and divergence time estimation of NRY haplogroups, and Y-STR module mainly includes Y-STR genotyping, statistical calculation, network analysis, and estimation of time to the most recent common ancestor (TMRCA) based on Y-STR haplotypes. Performance comparison indicated that Y-LineageTracker outperformed existing Y-DNA analysis tools for the high performance and satisfactory visualization effect.
Conclusions
Y-LineageTracker is an open-source and user-friendly command-line tool that provide multiple functions to efficiently analyze Y-DNA from NGS data at both Y-SNP and Y-STR level. Additionally, Y-LineageTracker supports various formats of input data and produces high-quality figures suitable for publication. Y-LineageTracker is coded with Python3 and supports Windows, Linux, and macOS platforms, and can be installed manually or via the Python Package Index (PyPI). The source code, examples, and manual of Y-LineageTracker are freely available at https://www.picb.ac.cn/PGG/resource.php or CodeOcean (https://codeocean.com/capsule/7424381/tree).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.