2024
DOI: 10.1101/2024.03.26.586362
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

GenoTools: An Open-Source Python Package for Efficient Genotype Data Quality Control and Analysis

Dan Vitale,
Mathew Koretsky,
Nicole Kuznetsov
et al.

Abstract: GenoTools, a Python package, streamlines population genetics research by integrating ancestry estimation, quality control (QC), and genome-wide association studies (GWAS) capabilities into efficient pipelines. By tracking samples, variants, and quality-specific measures throughout fully customizable pipelines, users can easily manage genetics data for large and small studies. GenoTools' Ancestry module renders highly accurate predictions, allowing for high-quality ancestry-specific studies, and enables custom … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 17 publications
(25 reference statements)
0
3
0
Order By: Relevance
“…Variants flagged by GCAD for removal were excluded, with additional variant and sample QC conducted using GenoTools 21 . Variants were excluded if the call rate <0.95, not in Hardy–Weinberg equilibrium (p < 1×10–4); samples were excluded if the call rate was <0.95, discordant sex was reported based on X chromosome heterozygosity, cryptic relatedness, and either insufficient or excessive heterozygosity.…”
Section: Methodsmentioning
confidence: 99%
“…Variants flagged by GCAD for removal were excluded, with additional variant and sample QC conducted using GenoTools 21 . Variants were excluded if the call rate <0.95, not in Hardy–Weinberg equilibrium (p < 1×10–4); samples were excluded if the call rate was <0.95, discordant sex was reported based on X chromosome heterozygosity, cryptic relatedness, and either insufficient or excessive heterozygosity.…”
Section: Methodsmentioning
confidence: 99%
“…These classifier models were then applied to the dataset to generate ancestry estimates for all samples. Detailed methodologies for the cloud-based and scalable pipeline employed for genotype calling, QC, and ancestry estimation are documented in the GenoTools 23 GitHub repository (https://doi.org/10.5281/zenodo.10719034) 27 .…”
Section: Methodsmentioning
confidence: 99%
“…Additionally, the array includes 96,517 customized variants. Automated genotype data processing was conducted on GenoTools 23 , a Python pipeline built for quality control and ancestry estimation of data. Additional details can be found at https://pypi.org/project/the-real-genotools/ 23 .…”
Section: Methodsmentioning
confidence: 99%