Protein design and variant prediction using autoregressive generative models

Shin, Jung-Eun; Riesselman, Adam J.; Kollasch, Aaron W.; McMahon, Conor; Simon, Elana; Sander, Chris; Manglik, Aashish; Kruse, Andrew C.; Marks, Daniel

doi:10.1038/s41467-021-22732-w

Cited by 268 publications

(255 citation statements)

References 98 publications

(134 reference statements)

Supporting

Mentioning

253

Contrasting

Order By: Relevance

“…Interestingly, the PROSS output for all of the input variants (regardless of the extra-glycine position) did not include mutations in CDR1 or its vicinity suggesting that, in contrast to A10 wt, the new variants are stable. We then evaluated the effect of glycine insertion on the selected CDR1 positions by the recently developed method SeqDesign [20], which predicted a higher relative sequence fitness for all of the possible glycine insertion combinations with respect to the A10 wt taken as a reference (its fitness was set to 0, Table S1). The conformational modifications induced in the CDR1 were assessed by analyzing the mutant A10 mutG0 in detail, as this insertion was expected to minimally affect the binding to the antigen.…”

Section: Resultsmentioning

confidence: 99%

“…SeqDesign employs word embedding and an autoregressive feed-forward deep generative neural network model for predicting specific amino acid probability at any position in the sequence given the previous amino acids present in the sequence. Thus, full sequence probability is obtained by the product of conditional probabilities on previous characters along a sequence (see Equation (1) in [20]). This allows probability predictions on insertional variants to be dealt with.…”

Section: Seqdesign Analysis Of Nanobody Sequences)mentioning

confidence: 99%

See 1 more Smart Citation

CDR1 Composition Can Affect Nanobody Recombinant Expression Yields

et al. 2021

View full text Add to dashboard Cite

The isolation of nanobodies from pre-immune libraries by means of biopanning is a straightforward process. Nevertheless, the recovered candidates often require optimization to improve some of their biophysical characteristics. In principle, CDRs are not mutated because they are likely to be part of the antibody paratope, but in this work, we describe a mutagenesis strategy that specifically addresses CDR1. Its sequence was identified as an instability hot spot by the PROSS program, and the available structural information indicated that four CDR1 residues bound directly to the antigen. We therefore modified the loop flexibility with the addition of an extra glycine rather than by mutating single amino acids. This approach significantly increased the nanobody yields but traded-off with moderate affinity loss. Accurate modeling coupled with atomistic molecular dynamics simulations enabled the modifications induced by the glycine insertion and the rationale behind the engineering design to be described in detail.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Seqdesign Analysis Of Nanobody Sequences)mentioning

confidence: 99%

CDR1 Composition Can Affect Nanobody Recombinant Expression Yields

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Heterogeneity of data in INDI allows nanobody researchers to obtain an accurate picture of the current state of knowledge of nanobody sequence, structure and function. Such knowledge can then accelerate the development of analytical frameworks (14,15), structural modeling (36), de novo nanobody design protocols (16) and as a basis for deep-learning models addressing nanobody design (17). Altogether we hope that INDI will form a solid data foundation to develop nanobody-specific computational methods that will accelerate development of novel therapeutics in this format.…”

Section: Discussionmentioning

confidence: 99%

“…By contrast, though nanobodies were discovered close to 30 years ago (11), they attracted less attention in collating data and developing computational protocols addressing these molecules (10). Development of approaches enabling computational design of nanobodies rely on ever deeper analysis of their sequence diversity (12,13) structural conformations (14), antigen-binding preferences (15), attempts at modifying their binding mode (16) and emerging deeplearning methods tackling this format (17). Successful computational protocols addressing nanobodies rely on sound sequence and structure data describing the biology of these molecules.…”

mentioning

confidence: 99%

INDI – Integrated Nanobody Database for Immunoinformatics

Deszyński¹,

Młokosiewicz²,

Volanakis

et al. 2021

Preprint

View full text Add to dashboard Cite

Nanobodies, a subclass of antibodies found in camelids, are a versatile molecular binding scaffold composed of a single polypeptide chain. The small size of nanobodies bestows multiple therapeutic advantages (stability, tumor penetration) with the first therapeutic approval in 2018 cementing the clinical viability of this format. Structured data and sequence information of nanobodies will enable the accelerated clinical development of nanobody-based therapeutics. Though the nanobody sequence and structure data are deposited in the public domain at an accelerating pace, the heterogeneity of sources and lack of standardization hampers reliable harvesting of nanobody information. We address this issue by creating the Integrated Database of Nanobodies for Immunoinformatics (INDI, http://research.naturalantibody.com/nanobodies). INDI collates nanobodies from all the major public outlets of biological sequences: patents, GenBank, next-generation sequencing repositories, structures and scientific publications. We equip INDI with powerful nanobody-specific sequence and text search facilitating access to more than 11 million nanobody sequences. INDI should facilitate development of novel nanobody-specific computational protocols helping to deliver on the therapeutic promise of this drug format.

show abstract

“…9 Machine learning methods have become increasing popular for protein structure prediction and design problems. 10 Specific to antibodies 11 , machine learning has been applied to predict developability 12 , improve humanization 13 , generate sequence libraries 14 , and predict antigen interactions. 15,16 In this work, we build on advances in general protein structure prediction [17][18][19] to predict antibody F V structures.…”

Section: Introductionmentioning

confidence: 99%