Although pooled-population sequencing has become a widely used approach for estimating allele frequencies, most work has proceeded in the absence of a proper statistical framework. We introduce a self-sufficient, closed-form, maximum-likelihood estimator for allele frequencies that accounts for errors associated with sequencing, and a likelihood-ratio test statistic that provides a simple means for evaluating the null hypothesis of monomorphism. Unbiased estimates of allele frequencies (where N is the number of individuals sampled) appear to be unachievable, and near-certain identification of a polymorphism requires a minor-allele frequency . A framework is provided for testing for significant differences in allele frequencies between populations, taking into account sampling at the levels of individuals within populations and sequences within pooled samples. Analyses that fail to account for the two tiers of sampling suffer from very large false-positive rates and can become increasingly misleading with increasing depths of sequence coverage. The power to detect significant allele-frequency differences between two populations is very limited unless both the number of sampled individuals and depth of sequencing coverage exceed 100.
Genotype calling plays important roles in population-genomic studies, which have been greatly accelerated by sequencing technologies. To take full advantage of the resultant information, we have developed maximum-likelihood (ML) methods for calling genotypes from high-throughput sequencing data. As the statistical uncertainties associated with sequencing data depend on depths of coverage, we have developed two types of genotype callers. One approach is appropriate for low-coverage sequencing data, and incorporates population-level information on genotype frequencies and error rates pre-estimated by an ML method. Performance evaluation using computer simulations and human data shows that the proposed framework yields less biased estimates of allele frequencies and more accurate genotype calls than current widely used methods. Another type of genotype caller applies to high-coverage sequencing data, requires no prior genotype-frequency estimates, and makes no assumption on the number of alleles at a polymorphic site. Using computer simulations, we determine the depth of coverage necessary to accurately characterize polymorphisms using this second method. We applied the proposed method to high-coverage (mean 18·) sequencing data of 83 clones from a population of Daphnia pulex. The results show that the proposed method enables conservative and reasonably powerful detection of polymorphisms with arbitrary numbers of alleles. We have extended the proposed method to the analysis of genomic data for polyploid organisms, showing that calling accurate polyploid genotypes requires much higher coverage than diploid genotypes. (Cockerham and Weir 1977;Weir 1996), can be used to examine the pattern of linkage disequilibrium. Despite the advantages, some difficulties are associated with highthroughput sequencing technologies. One of the main difficulties is the high sequencing error rates, which typically range from 0.001 to 0.01 per read per site with commonly used sequencing platforms (Glenn 2011;Quail et al. 2012). Second, because sequencing occurs randomly among sites, individuals, and chromosomes in diploid organisms, depths of coverage are variable at all levels. As a result, when depths of coverage are low, there are often missing data, which introduces biases in subsequent population-genetic analyses unless they are statistically accounted for (Pool et al. 2010). KEYWORDSTo call genotypes from high-throughput sequencing data, many statistical methods have been recently developed (e.g
Using data from 83 isolates from a single population, the population genomics of the microcrustacean are described and compared to current knowledge for the only other well-studied invertebrate, These two species are quite similar with respect to effective population sizes and mutation rates, although some features of recombination appear to be different, with linkage disequilibrium being elevated at short ([Formula: see text] bp) distances in and at long distances in The study population adheres closely to the expectations under Hardy-Weinberg equilibrium, and reflects a past population history of no more than a twofold range of variation in effective population size. Fourfold redundant silent sites and a restricted region of intronic sites appear to evolve in a nearly neutral fashion, providing a powerful tool for population genetic analyses. Amino acid replacement sites are predominantly under strong purifying selection, as are a large fraction of sites in UTRs and intergenic regions, but the majority of SNPs at such sites that rise to frequencies [Formula: see text] appear to evolve in a nearly neutral fashion. All forms of genomic sites (including replacement sites within codons, and intergenic and UTR regions) appear to be experiencing an [Formula: see text] higher level of selection scaled to the power of drift in , but this may in part be a consequence of recent demographic changes. These results establish as an excellent system for future work on the evolutionary genomics of natural populations.
A central problem in population genetics is to detect and analyze positive natural selection by which beneficial mutations are driven to fixation. The hitchhiking effect of a rapidly spreading beneficial mutation, which results in local removal of standing genetic variation, allows such an analysis using DNA sequence polymorphism. However, the current mathematical theory that predicts the pattern of genetic hitchhiking relies on the assumption that a beneficial mutation increases to a high frequency in a single randommating population, which is certainly violated in reality. Individuals in natural populations are distributed over a geographic space. The spread of a beneficial allele can be delayed by limited migration of individuals over the space and its hitchhiking effect can also be affected. To study this effect of geographic structure on genetic hitchhiking, we analyze a simple model of directional selection in a subdivided population. In contrast to previous studies on hitchhiking in subdivided populations, we mainly investigate the range of sufficiently high migration rates that would homogenize genetic variation at neutral loci. We provide a heuristic mathematical analysis that describes how the genealogical structure at a neutral locus linked to the locus under selection is expected to change in a population divided into two demes. Our results indicate that the overall strength of genetic hitchhiking-the degree to which expected heterozygosity decreases-is diminished by population subdivision, mainly because opportunity for the breakdown of hitchhiking by recombination increases as the spread of the beneficial mutation across demes is delayed when migration rate is much smaller than the strength of selection. Furthermore, the amount of genetic variation after a selective sweep is expected to be unequal over demes: a greater reduction in expected heterozygosity occurs in the subpopulation from which the beneficial mutation originates than in its neighboring subpopulations. This raises a possibility of detecting a "hidden" geographic structure of population by carefully analyzing the pattern of a selective sweep. W HEN a beneficial mutation arises in a population and rapidly increases to high frequency by directional selection, it also increases the frequency of neutral alleles on the same chromosome at linked polymorphic loci, resulting in a sudden reduction in genetic variation. This effect, termed genetic hitchhiking (Maynard Smith and Haigh 1974) Natural populations are composed of individuals that are distributed over geographical space. Therefore, with limited migration, mating occurs more frequently among individuals that are close to each other than among those that are far apart on the space. This geographic structure is often reduced to a simple demographic model in which a number of small populations (or demes), each of which is panmictic, are connected to each other by limited migration (Wright 1940). In this model, a proportion m of individuals in a deme is replaced by migrants from...
An improved understanding of the biological and numerical properties of measures of population differentiation across loci is becoming increasingly more important because of their growing use in analyzing genome-wide polymorphism data for detecting population structures, inferring the rates of migration, and identifying local adaptations. In a genome-wide analysis, we discovered that the estimates of population differentiation (e.g., F ST , , and Jost's D) calculated for human single-nucleotide polymorphisms (SNPs) are strongly and positively correlated to the position-specific evolutionary rates measured from multispecies alignments. That is, genomic positions (loci) experiencing higher purifying selection (lower evolutionary rates) produce lower values for the degree of population differentiation than those evolving with faster rates. We show that this pattern is completely mediated by the negative effects of purifying selection on the minor allele frequency (MAF) at individual loci. Our results suggest that inferences and methods relying on the comparison of population differentiation estimates (F ST , , and Jost's D) based on SNPs across genomic positions should be restricted to loci with similar MAFs and/or the rates of evolution in genome scale surveys.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.