Although many computer programs can perform population genetics calculations, they are typically limited in the analyses and data input formats they offer; few applications can process the large data sets produced by whole-genome resequencing projects. Furthermore, there is no coherent framework for the easy integration of new statistics into existing pipelines, hindering the development and application of new population genetics and genomics approaches. Here, we present PopGenome, a population genomics package for the R software environment (a de facto standard for statistical analyses). PopGenome can efficiently process genome-scale data as well as large sets of individual loci. It reads DNA alignments and single-nucleotide polymorphism (SNP) data sets in most common formats, including those used by the HapMap, 1000 human genomes, and 1001 Arabidopsis genomes projects. PopGenome also reads associated annotation files in GFF format, enabling users to easily define regions or classify SNPs based on their annotation; all analyses can also be applied to sliding windows. PopGenome offers a wide range of diverse population genetics analyses, including neutrality tests as well as statistics for population differentiation, linkage disequilibrium, and recombination. PopGenome is linked to Hudson’s MS and Ewing’s MSMS programs to assess statistical significance based on coalescent simulations. PopGenome’s integration in R facilitates effortless and reproducible downstream analyses as well as the production of publication-quality graphics. Developers can easily incorporate new analyses methods into the PopGenome framework. PopGenome and R are freely available from CRAN (http://cran.r-project.org/) for all major operating systems under the GNU General Public License.
Background Research over the last 10 years highlights the increasing importance of hybridization between species as a major force structuring the evolution of genomes and potentially providing raw material for adaptation by natural and/or sexual selection. Fueled by research in a few model systems where phenotypic hybrids are easily identified, research into hybridization and introgression (the flow of genes between species) has exploded with the advent of whole-genome sequencing and emerging methods to detect the signature of hybridization at the whole-genome or chromosome level. Amongst these are a general class of methods that utilize patterns of single-nucleotide polymorphisms (SNPs) across a tree as markers of hybridization. These methods have been applied to a variety of genomic systems ranging from butterflies to Neanderthals to detect introgression, however, when employed at a fine genomic scale these methods do not perform well to quantify introgression in small sample windows. Results We introduce a novel method to detect introgression by combining two widely used statistics: pairwise nucleotide diversity d xy and Patterson’s D . The resulting statistic, the distance fraction ( d f ), accounts for genetic distance across possible topologies and is designed to simultaneously detect and quantify introgression. We also relate our new method to the recently published f d and incorporate these statistics into the powerful genomics R-package PopGenome, freely available on GitHub ( pievos101/PopGenome ) and the Comprehensive R Archive Network (CRAN). The supplemental material contains a wide range of simulation studies and a detailed manual how to perform the statistics within the PopGenome framework. Conclusion We present a new distance based statistic d f that avoids the pitfalls of Patterson’s D when applied to small genomic regions and accurately quantifies the fraction of introgression ( f ) for a wide range of simulation scenarios. Electronic supplementary material The online version of this article (10.1186/s12859-019-2747-z) contains supplementary material, which is available to authorized users.
Background: Research over the last 10 years highlights the increasing importance of hybridization between species as a major force structuring the evolution of genomes and potentially providing raw material for adaptation by natural and/or sexual selection. Fueled by research in a few model systems where phenotypic hybrids are easily identified, research into hybridization and introgression (the flow of genes between species) has exploded with the advent of whole-genome sequencing and emerging methods to detect the signature of hybridization at the whole-genome or chromosome level. Amongst these are a general class of methods that utilize patterns of single-nucleotide polymorphism (SNP) polymorphisms across a tree as markers of hybridization. These methods have been applied to a variety of genomic systems ranging from butterflies to Neanderthal's to detect introgression, however, when employed at a fine genomic scale these methods do not perform well to quantify introgression in small sample windows. Results: We introduce a novel method to detect introgression by combining two widely used statistics: pairwise nucleotide diversity d xy and Patterson's D. The resulting statistic, the Basic distance fraction (Bd f ), accounts for genetic distance across possible topologies and is designed to simultaneously detect and quantify introgression. We also relate our new method to the recently published f d and incorporate these statistics into the powerful genomics R-package PopGenome, freely available on CRAN. The supplemental material contains a wide range of simulation studies and a detailed manual how to perform the statistics within the PopGenome framework. Conclusion: We present a new distance based statistic Bd f that avoids the pitfalls of Patterson's D when applied to small genomic regions and more accurately quantifies the fraction of introgression (f ) for a wide range of simulation scenarios.
Recent advances in DNA sequencing technology, which reduced costs and increased throughput and accuracy, have driven subsequent advances in population genomics methods for detecting traces of natural selection in DNA fragments. In a recombining chromosome, a subgenomic region under natural selection typically exhibits different levels of variation and differentiation than the rest of the genome (Li & Ralph, 2019). Hence, it can be considered an anomaly that deviates from the overall population structure (François, Martins, Caye, & Schoville, 2016; Haasl & Payseur, 2016). Identifying such anomalies in molecular data is of great significance since the respective footprints of localized natural selection can provide insight into the adaptation process of a population to its environment through different generations. One of the most frequently used statistics to detect genomic regions under selection is the Fixation Index (F ST), which was introduced to quantify population differentiation based on the Wright-Fisher model (Wright, 1949). Several F ST variants are widely employed in population genomics (Hudson, Slatkin, & Maddison, 1992; Weir & Cockerham, 1984; Weir & Ott, 1997) because high F ST values can be an indication of local adaptation. However, when the population history deviates from the Wright-Fisher model, or when evolutionary history is described by a hierarchical population structure model, hypothesis testing becomes a challenge because the F ST distribution that accounts for the neutral demographic model of the population under study is not known. In this case, F ST-based methods that do
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.