Increasing interest in determining the effects of genetic variation for bioengineering, human health and basic biological research has propelled the development of technologies for high-throughput mutagenesis and selection. However, since designing functional assays is challenging and systematic testing of combinations of mutations is intractable, there is a parallel need to develop more accurate computational predictions.. Most computational methods have relied significantly on the signal of evolutionary conservation, but do not account for dependencies between positions in a sequence. We present an unsupervised method for predicting the effects of mutations (EVmutation) that explicitly captures residue dependencies between positions. We find that it improves the prediction accuracies of a comprehensive collection of recent high-throughput experimental fitness landscapes, biochemical measurements and human disease mutations. We suggest EVmutation may be useful to assess the quantitative effects of mutations in genes of any organism and provide precomputed predictions for ~ 7000 human proteins.
The functions of proteins and RNAs are defined by the collective interactions of many residues, and yet most statistical models of biological sequences consider sites nearly independently. Recent approaches have demonstrated benefits of including interactions to capture pairwise covariation, but leave higher-order dependencies out of reach. Here we show how it is possible to capture higher-order, context-dependent constraints in biological sequences via latent variable models with nonlinear dependencies. We found that DeepSequence ( https://github.com/debbiemarkslab/DeepSequence ), a probabilistic model for sequence families, predicted the effects of mutations across a variety of deep mutational scanning experiments substantially better than existing methods based on the same evolutionary data. The model, learned in an unsupervised manner solely on the basis of sequence information, is grounded with biologically motivated priors, reveals the latent organization of sequence families, and can be used to explore new parts of sequence space.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.