Adeno-associated virus (AAV) capsids can deliver transformative gene therapies, but our understanding of AAV biology remains incomplete. We generated the complete first-order AAV2 capsid fitness landscape, characterizing all single-codon substitutions, insertions, and deletions across multiple functions relevant for in vivo delivery. We discovered a frameshifted gene in the VP1 region that expresses a membrane-associated accessory protein that limits AAV production through competitive exclusion. Mutant biodistribution revealed the importance of both surface-exposed and buried residues, with a few phenotypic profiles characterizing most variants. Finally, we algorithmically designed and experimentally verified a diverse in vivo targeted capsid library with viability far exceeding random mutagenesis approaches. These results demonstrate the power of systematic mutagenesis for deciphering complex genomes and the potential of empirical machine-guided protein engineering.
Nature provides abundant examples of protein families with highly diverged sequences. The ability to design new protein homologs has many applications, yet synthetic approaches have been unable to generate similarly diverse protein sequences with functional activity in the lab [1, 2]. New technologies offer a solution: high-throughput DNA synthesis and sequencing technologies allow thousands of designed sequences to be assayed in parallel, enabling deep diversification guided by machine learning (ML) models that relate protein sequence to function without detailed biophysical or mechanistic modeling. Here we apply deep learning to design novel adeno-associated virus (AAV) capsid proteins, a challenging target of great utility for gene therapy. Focusing on a 28-amino acid segment spanning buried and exposed regions, we generated 201,426 highly diverse variants of the AAV2 wildtype (WT) sequence, yielding 110,689 viable synthetic capsids, 57,348 of which surpass the average diversity of natural AAV serotype sequences with 12-29 mutations across this region. Even when trained on limited data, deep neural network models accurately predicted capsid viability across highly diverse variants. Deep diversification enables the design of AAV capsids with completely synthetic sequences for the universal treatment of all patients regardless of prior exposure to natural AAV, while demonstrating a general approach that makes vast areas of functional but previously unreachable sequence space accessible.EK, PJO, NJ, SS, GMC performed research while at Harvard University and EK, SS also performed research while at Dyno Therapeutics. EK, SS, and GMC hold equity at Dyno Therapeutics. A full list of GMC's tech transfer, advisory roles, and funding sources can be found on the lab's website: http://arep.med.harvard.edu/gmc/tech.html . Harvard University has filed a provisional patent application for inventions related to this work. DHB, AB, LJC, PR performed research as part of their employment at Google LLC. Google is a technology company that sells machine learning services as part of its business. Data availabilityExperimental data for all 3 experiments will be deposited on a public repository (NCBI SRA ( https://www.ncbi.nlm.nih.gov/sra ) , id: SUB7629680) by publication date. Code availabilityThe TensorFlow 1.3 API was used to implement and train all models using the architectures described in Methods. The training and validation datasets used for creating each model are available as part of the experimental dataset released as described in the preceding section. The code required to construct the A 39 training data and also to synthesize, process, and analyze the experimental data is provided for download, together with ipython notebooks that reproduce the analysis figures from the main text.10 284 1 0.40%
Adeno-associated virus (AAV) capsids have shown clinical promise as delivery vectors for gene therapy. However, the high prevalence of pre-existing immunity against natural capsids poses a challenge for widespread treatment. The generation of diverse capsids that are potentially more capable of immune evasion is challenging because introducing multiple mutations often breaks capsid assembly. Here we target a representative, immunologically relevant 28-amino-acid segment of the AAV2 capsid and show that a low-complexity Variational Auto-encoder (VAE) can interpolate in sequence space to produce diverse and novel capsids capable of packaging their own genomes. We first train the VAE on a 564-sample Multiple-Sequence Alignment (MSA) of dependo-parvoviruses, and then further augment this dataset by adding 22,704 samples from a deep mutational exploration (DME) on the target region. In both cases the VAE generated viable variants with many mutations, which we validated experimentally. We propose that this simple approach can be used to optimize and diversify other proteins, as well as other capsid traits of interest for gene delivery.
Proteins are responsible for the most diverse set of functions in biology. The ability to extract information from protein sequences and to predict the effects of mutations is extremely valuable in many domains of biology and medicine. However the mapping between protein sequence and function is complex and poorly understood. Here we present an embedding of natural protein sequences using a Variational Auto-Encoder and use it to predict how mutations affect protein function. We use this unsupervised approach to cluster natural variants and learn interactions between sets of positions within a protein. This approach generally performs better than baseline methods that consider no interactions within sequences, and in some cases better than the state-of-the-art approaches that use the inverse-Potts model. This generative model can be used to computationally guide exploration of protein sequence space and to better inform rational and automatic protein design.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.