2021
DOI: 10.1111/1755-0998.13326
|View full text |Cite|
|
Sign up to set email alerts
|

pixy: Unbiased estimation of nucleotide diversity and divergence in the presence of missing data

Abstract: Population genetic analyses often use summary statistics to describe patterns of genetic variation and provide insight into evolutionary processes. Among the most fundamental of these summary statistics are π and dXY, which are used to describe genetic diversity within and between populations, respectively. Here, we address a widespread issue in π and dXY calculation: systematic bias generated by missing data of various types. Many popular methods for calculating π and dXY operate on data encoded in the varian… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
231
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 342 publications
(232 citation statements)
references
References 33 publications
1
231
0
Order By: Relevance
“…Signature of selective sweep within chr5 candidate region. To test for signatures of a selective sweep within the candidate gene block in the coastal STOW populations, we calculated nucleotide diversity (π) with pixy 72 and linkage equilibrium (LD) around the~1.2 Mb candidate gene block (see above) with VCFtools and the input being all the coast STOW samples. We calculated pair-wise LD as the squared correlation coefficient of genotypes with the '-geno-r2' command.…”
Section: Methodsmentioning
confidence: 99%
“…Signature of selective sweep within chr5 candidate region. To test for signatures of a selective sweep within the candidate gene block in the coastal STOW populations, we calculated nucleotide diversity (π) with pixy 72 and linkage equilibrium (LD) around the~1.2 Mb candidate gene block (see above) with VCFtools and the input being all the coast STOW samples. We calculated pair-wise LD as the squared correlation coefficient of genotypes with the '-geno-r2' command.…”
Section: Methodsmentioning
confidence: 99%
“…Genetic diversity indices, including the observed heterozygosity ( H O ) and expected heterozygosity ( H E ) for weedy and cultivated broomcorn millets were calculated using VCFtools software ( Danecek et al, 2011 ). The analysis of nucleotide diversity (π) was performed using pixy 1.0.0 based on VCF data including invariant sites to avoid biased estimation ( Korunes and Samuk, 2021 ). A VCF file including invariant sites was generated in GATK 3.8 by using the “-allSites” flag in GenotypeGVCFs, with the filtering criteria set to “DP > = 5, GQ > = 40| RGQ > = 40” for invariant sites.…”
Section: Methodsmentioning
confidence: 99%
“…The LD decay graphs of both weedy and cultivated broomcorn millets were plotted. The nucleotide diversity (π) and the pairwise fixation index, i.e., F-statistics ( F ST ), across different groups (excluding the “mosaics”) were calculated using pixy ( Korunes and Samuk, 2021 ).…”
Section: Methodsmentioning
confidence: 99%
“…To account for the differences in read coverage between samples we used the pixy software [77], which produces unbiased estimates of D XY in the presence of missing data. Population divergence (D XY ) was calculated from this all-site VCF using pixy version 1.0.4.beta1, either in 10 kb windows or in gene coordinate windows.…”
Section: Methodsmentioning
confidence: 99%