Variation of an inherited trait across a population cannot be explained by additive contributions of relevant genes, due to epigenetic effects and biochemical interactions (epistasis). Detecting epistasis in genomic data still represents a significant challenge that requires a better understanding of epistasis from the mechanistic point of view. Using a standard Wright-Fisher model of bi-allelic asexual population, we study how compensatory epistasis affects the process of adaptation. The main result is a universal relationship between four haplotype frequencies of a single site pair in a genome, which depends only on the epistasis strength of the pair defined regarding Darwinian fitness. We demonstrate the existence, at any time point, of a quasi-equilibrium between epistasis and disorder (entropy) caused by random genetic drift and mutation. We verify the accuracy of these analytic results by Monte-Carlo simulation over a broad range of parameters, including the topology of the interacting network. Thus, epistasis assists the evolutionary transit through evolutionary hurdles leaving marks at the level of haplotype disequilibrium. The method allows determining selection coefficient for each site and the epistasis strength of each pair from a sequence set. The resulting ability to detect clusters of deleterious mutation close to full compensation is essential for biomedical applications. These findings help to understand the role of epistasis in multiple compensatory mutations in viral resistance to antivirals and immune response.
Different genomic sites evolve inter-dependently due to the combined action of epistasis, defined as a non-multiplicative contribution of alleles at different loci to genome fitness, and the physical linkage of different loci in genome. Both epistasis and linkage, partially compensated by recombination, cause correlations between allele frequencies at the loci (linkage disequilibrium, LD). The interaction and competition between epistasis and linkage are not fully understood, nor is their relative sensitivity to recombination. Modeling an adapting population in the presence of random mutation, natural selection, pairwise epistasis, and random genetic drift, we compare the contributions of epistasis and linkage. For this end, we use a panel of haplotype-based measures of LD and their various combinations calculated for epistatic and non-epistatic pairs separately. We compute the optimal percentages of detected and false positive pairs in a one-time sample of a population of moderate size. We demonstrate that true interacting pairs can be told apart in a sufficiently short genome within a narrow window of time and parameters. Outside of this parameter region, unless the population is extremely large, shared ancestry of individual sequences generates pervasive stochastic LD for non-interacting pairs masking true epistatic associations. In the presence of sufficiently strong recombination, linkage effects decrease faster than those of epistasis, and the detection of epistasis improves. We demonstrate that the epistasis component of locus association can be isolated, at a single time point, by averaging haplotype frequencies over multiple independent populations. These results demonstrate the existence of fundamental restrictions on the protocols for detecting true interactions in DNA sequence sets.
Linkage effects in a multi-locus population strongly influence its evolution. The models based on the traveling wave approach enable us to predict the average speed of evolution and the statistics of phylogeny. However, predicting statistically the evolution of specific sites and pairs of sites in the multi-locus context remains a mathematical challenge. In particular, the effects of epistasis, the interaction of gene regions contributing to phenotype, is difficult to predict theoretically and detect experimentally in sequence data. A large number of false-positive interactions arises from stochastic linkage effects and indirect interactions, which mask true epistatic interactions. Here we develop a proof-of-principle method to filter out false-positive interactions. We start by demonstrating that the averaging of haplotype frequencies over multiple independent populations is necessary but not sufficient for epistatic detection, because it still leaves high numbers of false-positive interactions. To compensate for the residual stochastic noise, we develop a three-way haplotype method isolating true interactions. The fidelity of the method is confirmed analytically and on simulated genetic sequences evolved with a known epistatic network. The method is then applied to a large sequence database of neurominidase protein of influenza A H1N1 obtained from various geographic locations to infer the epistatic network responsible for the difference between the pre-pandemic virus and the pandemic strain of 2009. These results present a simple and reliable technique to measure epistatic interactions of any sign from sequence data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.