Population genetic analyses often use summary statistics to describe patterns of genetic variation and provide insight into evolutionary processes. Among the most fundamental of these summary statistics are π and dXY, which are used to describe genetic diversity within and between populations, respectively. Here, we address a widespread issue in π and dXY calculation: systematic bias generated by missing data of various types. Many popular methods for calculating π and dXY operate on data encoded in the variant call format (VCF), which condenses genetic data by omitting invariant sites. When calculating π and dXY using a VCF, it is often implicitly assumed that missing genotypes (including those at sites not represented in the VCF) are homozygous for the reference allele. Here, we show how this assumption can result in substantial downward bias in estimates of π and dXY that is directly proportional to the amount of missing data. We discuss the pervasive nature and importance of this problem in population genetics, and introduce a user‐friendly UNIX command line utility, pixy, that solves this problem via an algorithm that generates unbiased estimates of π and dXY in the face of missing data. We compare pixy to existing methods using both simulated and empirical data, and show that pixy alone produces unbiased estimates of π and dXY regardless of the form or amount of missing data. In summary, our software solves a long‐standing problem in applied population genetics and highlights the importance of properly accounting for missing data in population genetic analyses.
This is the author manuscript accepted for publication and has undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may lead to differences between this version and the Version of Record. Please cite this article as
Humans have undergone large migrations over the past hundreds to thousands of years, exposing ourselves to new environments and selective pressures. Yet, evidence of ongoing or recent selection in humans is difficult to detect. Many of these migrations also resulted in gene flow between previously separated populations. These recently admixed populations provide unique opportunities to study rapid evolution in humans. Developing methods based on distributions of local ancestry, we demonstrate that this sort of genetic exchange has facilitated detectable adaptation to a malaria parasite in the admixed population of Cabo Verde within the last ~20 generations. We estimate that the selection coefficient is approximately 0.08, one of the highest inferred in humans. Notably, we show that this strong selection at a single locus has likely affected patterns of ancestry genome-wide, potentially biasing demographic inference. Our study provides evidence of adaptation in a human population on historical timescales.
Crossing over is well known to have profound effects on patterns of genetic diversity and genome evolution. Far less direct attention has been paid to another distinct outcome of meiotic recombination: noncrossover gene conversion (NCGC). Crossing over and NCGC both shuffle combinations of alleles, and this degradation of linkage disequilibrium (LD) has major evolutionary consequences, ranging from immediate effects on nucleotide diversity to long-term consequences that shape genome evolution, species formation and species persistence. Unlike simple crossing over, NCGC has the potential to alter allele frequencies. Gene conversion can also occur in genomic regions where crossing over does not, and it purportedly exhibits more uniform rates across genomes. Considerable progress has been made towards understanding the mechanisms of gene conversion, and this progress enables us to begin exploring how gene conversion affects processes such as molecular evolution and interspecies gene flow. These topics are timely with the recent shift in focus from a primarily neutral null model of molecular evolution and speciation to one incorporating base levels of selection, making it all the more crucial to understand the basis and evolutionary implications of linkage. Here, we discuss the impact of gene conversion on genome structure and evolution and the current methods for detecting these events. We provide a comprehensive review of how gene conversion breaks down LD and affects both short- and long-term evolutionary processes, and we contrast its impact to that expected from crossing over alone.
1 2 Population genetic analyses often use summary statistics to describe patterns of genetic 3 variation and provide insight into evolutionary processes. Among the most fundamental 4 of these summary statistics are π and d XY , which are used to describe genetic diversity 5 within and between populations, respectively. Here, we address a widespread issue in π 6 and d XY calculation: systematic bias generated by missing data of various types. Many 7 popular methods for calculating π and d XY operate on data encoded in the Variant Call 8 Format (VCF), which condenses genetic data by omitting invariant sites. When 9 calculating π and d XY using a VCF, it is often implicitly assumed that missing genotypes 10 (including those at sites not represented in the VCF) are homozygous for the reference 11 allele. Here, we show how this assumption can result in substantial downward bias in 12 estimates of π and d XY that is directly proportional to the amount of missing data. We 13 discuss the pervasive nature and importance of this problem in population genetics, and 14 introduce a user-friendly UNIX command line utility, pixy, that solves this problem via 15 an algorithm that generates unbiased estimates of π and d XY in the face of missing data. 16 We compare pixy to existing methods using both simulated and empirical data, and 17 show that pixy alone produces unbiased estimates of π and d XY regardless of the form or 18 amount of missing data. In sum, our software solves a long-standing problem in applied 19 population genetics and highlights the importance of properly accounting for missing 20 data in population genetic analyses. 21 22 31 genetics. 32 33 Many summary statistics are based on the comparison of DNA sequences. Two 34 important summary statistics in this class are π, the average number of nucleotide 35 differences between genotypes drawn from the same population (Nei and Li 1979); and 36 d XY , the average number of nucleotide differences between genotypes drawn from two 37 different populations (Nei and Li 1979). These two summary statistics underlie a large 38 variety of descriptive and inferential procedures in population genetics. For example, π 39 is often used as an estimator of the central population genetic parameter (and is thus 40 sometimes styled as ). Similarly, d XY is a key statistic for exploring patterns of 41 divergence between populations, particularly in the context of divergence with gene 42 flow (Noor and Bennett 2009; Cruickshank and Hahn 2014; Burri 2017). 43 44 Calculation of π and d XY 45 46 For a single biallelic locus, π is usually calculated using one of three expressions shown 47 in Equation 1, all of which are exactly equivalent: 48 (Eq. 1) 49 50 51 52 (Nei and Li 1979; Gillespie 2004; Hahn 2019) 53 54 Where k ij corresponds to the count of allelic differences between the ith and jth haploid 55 genotypes, n is the number of samples, and c 0 and c 1 are the respective counts of the two 56 alleles at the locus. Note that the last expression is simply the sample-size correct...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.