The explosion in population genomic data demands ever more complex modes of analysis, and increasingly these analyses depend on sophisticated simulations. Re-cent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.
We report sequencing-based whole-genome association analyses to evaluate the impact of rare and founder variants on stature in 6,307 individuals on the island of Sardinia. We identified two variants with large effects. One is a stop codon in the GHR gene, relatively frequent in Sardinia (0.87% vs <0.01% elsewhere), which in homozygosity causes the short stature Laron syndrome. We find that it reduces height in heterozygotes by an average of 4.2 cm (−0.64 s.d). The other variant, in the imprinted KCNQ1 gene (MAF = 7.7% vs <1% elsewhere) reduces height by an average of 1.83 cm (−0.31 s.d.) when maternally inherited. Additionally, polygenic scores indicate that known height-decreasing alleles are at systematically higher frequency in Sardinians than would be expected by genetic drift. The findings are consistent with selection toward shorter stature in Sardinia and a suggestive human example of the proposed “island effect” reducing the size of large mammals.
Most current genotype imputation methods are model-based and computationally intensive, taking days to impute one chromosome pair on 1000 people. We describe an efficient genotype imputation method based on matrix completion. Our matrix completion method is implemented in MATLAB and tested on real data from HapMap 3, simulated pedigree data, and simulated low-coverage sequencing data derived from the 1000 Genomes Project. Compared with leading imputation programs, the matrix completion algorithm embodied in our program MENDEL-IMPUTE achieves comparable imputation accuracy while reducing run times significantly. Implementation in a lower-level language such as Fortran or C is apt to further improve computational efficiency.[Supplemental material is available for this article.]Modern genomics studies almost invariably deal with massive amounts of data. Data sets collected on single nucleotide polymorphisms (SNPs), next-generation sequencing (NGS), copy number variations (CNV), and RNA-seq all fall into this category. In spite of the universal occurrence of missing data, downstream analysis methods usually depend on complete data. For instance, in genome-wide association studies (GWAS), genotype imputation is essential not only for predicting the occasionally missing genotypes in a SNP panel but also for combining data from different panels typed on different platforms. Exploiting these in silico genotypes can boost the power of association studies, encourage finer-scale gene mapping, and enable meta-analysis.Several software packages are available for genotype imputation, notably fastPHASE ( . Recent reviews provide comprehensive comparisons of these methods (Nothnagel et al. 2009;Marchini and Howie 2010). All existing packages rely on a probabilistic model of linkage disequilibrium to construct and connect underlying haplotypes. Genotype imputation is based on either inferred haplotypes or a set of reference haplotypes read into the programs. At the genomic scale, computation is highly intensive. Imputing a single chromosome with about 10 5 SNPs typically takes hours for 100 individuals and days for 1000 individuals. Because NGS routinely yields at least a few orders of magnitude more SNP data than genotyping chips, genotype imputation may well hit a computational wall in the near future.In the machine learning community, matrix completion is a popular and effective imputation tool in many domains outside of genetics (Candès and Recht 2009;Cai et al. 2010;Mazumder et al. 2010). Matrix completion aims to recover an entire matrix when only a small portion of its entries are actually observed. In the spirit of Occam's razor, it seeks the simplest matrix consistent with the observed entries. This criterion conveniently translates into searching for a low rank matrix with a small squared error difference over the observed entries. The celebrated Netflix Challenge represented a typical application to recommender systems (Koren et al. 2009). The Netflix data consist of ratings (1, 2, 3, 4, or 5) of 480,189 customers on ...
Gene flow of transgenes into non-target populations is an important biosafety concern. The case of genetically modified (GM) maize in Mexico has been of particular interest because of the country’s status as center of origin and landrace diversity. In contrast to maize in the U.S. and Europe, Mexican landraces form part of an evolving metapopulation in which new genes are subject to evolutionary processes of drift, gene flow and selection. Although these processes are affected by seed management and particularly seed flow, there has been little study into the population genetics of transgenes under traditional seed management. Here, we combine recently compiled data on seed management practices with a spatially explicit population genetic model to evaluate the importance of seed flow as a determinant of the long-term fate of transgenes in traditional seed systems. Seed flow between farmers leads to a much wider diffusion of transgenes than expected by pollen movement alone, but a predominance of seed replacement over seed mixing lowers the probability of detection due to a relative lack of homogenization in spatial frequencies. We find that in spite of the spatial complexities of the modeled system, persistence probabilities under positive selection are estimated quite well by existing theory. Our results have important implications concerning the feasibility of long term transgene monitoring and control in traditional seed systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.