2021
DOI: 10.1038/s41467-021-22732-w
|View full text |Cite
|
Sign up to set email alerts
|

Protein design and variant prediction using autoregressive generative models

Abstract: The ability to design functional sequences and predict effects of variation is central to protein engineering and biotherapeutics. State-of-art computational methods rely on models that leverage evolutionary information but are inadequate for important applications where multiple sequence alignments are not robust. Such applications include the prediction of variant effects of indels, disordered proteins, and the design of proteins such as antibodies due to the highly variable complementarity determining regio… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
253
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 268 publications
(255 citation statements)
references
References 98 publications
(134 reference statements)
2
253
0
Order By: Relevance
“…Interestingly, the PROSS output for all of the input variants (regardless of the extra-glycine position) did not include mutations in CDR1 or its vicinity suggesting that, in contrast to A10 wt, the new variants are stable. We then evaluated the effect of glycine insertion on the selected CDR1 positions by the recently developed method SeqDesign [20], which predicted a higher relative sequence fitness for all of the possible glycine insertion combinations with respect to the A10 wt taken as a reference (its fitness was set to 0, Table S1). The conformational modifications induced in the CDR1 were assessed by analyzing the mutant A10 mutG0 in detail, as this insertion was expected to minimally affect the binding to the antigen.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Interestingly, the PROSS output for all of the input variants (regardless of the extra-glycine position) did not include mutations in CDR1 or its vicinity suggesting that, in contrast to A10 wt, the new variants are stable. We then evaluated the effect of glycine insertion on the selected CDR1 positions by the recently developed method SeqDesign [20], which predicted a higher relative sequence fitness for all of the possible glycine insertion combinations with respect to the A10 wt taken as a reference (its fitness was set to 0, Table S1). The conformational modifications induced in the CDR1 were assessed by analyzing the mutant A10 mutG0 in detail, as this insertion was expected to minimally affect the binding to the antigen.…”
Section: Resultsmentioning
confidence: 99%
“…SeqDesign employs word embedding and an autoregressive feed-forward deep generative neural network model for predicting specific amino acid probability at any position in the sequence given the previous amino acids present in the sequence. Thus, full sequence probability is obtained by the product of conditional probabilities on previous characters along a sequence (see Equation (1) in [20]). This allows probability predictions on insertional variants to be dealt with.…”
Section: Seqdesign Analysis Of Nanobody Sequences)mentioning
confidence: 99%
“…Heterogeneity of data in INDI allows nanobody researchers to obtain an accurate picture of the current state of knowledge of nanobody sequence, structure and function. Such knowledge can then accelerate the development of analytical frameworks (14,15), structural modeling (36), de novo nanobody design protocols (16) and as a basis for deep-learning models addressing nanobody design (17). Altogether we hope that INDI will form a solid data foundation to develop nanobody-specific computational methods that will accelerate development of novel therapeutics in this format.…”
Section: Discussionmentioning
confidence: 99%
“…By contrast, though nanobodies were discovered close to 30 years ago (11), they attracted less attention in collating data and developing computational protocols addressing these molecules (10). Development of approaches enabling computational design of nanobodies rely on ever deeper analysis of their sequence diversity (12,13) structural conformations (14), antigen-binding preferences (15), attempts at modifying their binding mode (16) and emerging deeplearning methods tackling this format (17). Successful computational protocols addressing nanobodies rely on sound sequence and structure data describing the biology of these molecules.…”
mentioning
confidence: 99%
“…9 Machine learning methods have become increasing popular for protein structure prediction and design problems. 10 Specific to antibodies 11 , machine learning has been applied to predict developability 12 , improve humanization 13 , generate sequence libraries 14 , and predict antigen interactions. 15,16 In this work, we build on advances in general protein structure prediction [17][18][19] to predict antibody F V structures.…”
Section: Introductionmentioning
confidence: 99%