Protein-peptide interactions play a fundamental role in many cellular processes, but remain underexplored experimentally and difficult to model computationally. Here, we present PepNN-Struct and PepNN-Seq, structure and sequence-based approaches for the prediction of peptide binding sites on a protein. A main difficulty for the prediction of peptide-protein interactions is the flexibility of peptides and their tendency to undergo conformational changes upon binding. Motivated by this, we developed reciprocal attention to simultaneously update the encodings of peptide and protein residues while enforcing symmetry, allowing for information flow between the two inputs. PepNN integrates this module with modern graph neural network layers and a series of transfer learning steps are used during training to compensate for the scarcity of peptide-protein complex information. We show that PepNN-Struct achieves consistently high performance across different benchmark datasets. We also show that PepNN makes reasonable peptide-agnostic predictions, allowing for the identification of novel peptide binding proteins.
Cys2His2 zinc finger (ZF) domains engineered to bind specific target sequences in the genome provide an effective strategy for programmable regulation of gene expression, with many potential therapeutic applications. However, the structurally intricate engagement of ZF domains with DNA has made their design challenging. Here we describe the screening of 49 billion protein–DNA interactions and the development of a deep-learning model, ZFDesign, that solves ZF design for any genomic target. ZFDesign is a modern machine learning method that models global and target-specific differences induced by a range of library environments and specifically takes into account compatibility of neighboring fingers using a novel hierarchical transformer architecture. We demonstrate the versatility of designed ZFs as nucleases as well as activators and repressors by seamless reprogramming of human transcription factors. These factors could be used to upregulate an allele of haploinsufficiency, downregulate a gain-of-function mutation or test the consequence of regulation of a single gene as opposed to the many genes that a transcription factor would normally influence.
Protein-peptide interactions play a fundamental role in facilitating many cellular processes, but remain underexplored experimentally and difficult to model computationally. Here, we introduce PepNN-Struct and PepNN-Seq, structure and sequence-based approaches for the prediction of peptide binding sites on a protein given the sequence of a peptide ligand. The models make use of a novel reciprocal attention module that is able to better reflect biochemical realities of peptides undergoing conformational changes upon binding. To compensate for the scarcity of peptide-protein complex structural information, we make use of available protein-protein complex and protein sequence information through a series of transfer learning steps. PepNN-Struct achieves state-of-the-art performance on the task of identifying peptide binding sites, with a ROC AUC of 0.893 and an MCC of 0.483 on an independent test set. Beyond prediction of binding sites on proteins with a known peptide ligand, we also show that the developed models make reasonable agnostic predictions, allowing for the identification of novel peptide binding proteins.
Deep learning approaches have spurred substantial advances in the single-state prediction of biomolecular structures. The function of biomolecules is, however, dependent on the range of conformations they can assume. This is especially true for peptides, a highly flexible class of molecules that are involved in numerous biological processes and are of high interest as therapeutics. Here, we introduce PepFlow, a generalized Boltzmann generator that enables direct all-atom sampling from the allowable conformational space of input peptides. We train the model in a diffusion framework and subsequently use an equivalent flow to perform conformational sampling. To overcome the prohibitive cost of generalized all-atom modelling, we modularize the generation process and integrate a hypernetwork to predict sequence-specific network parameters. PepFlow accurately predicts peptide structures and effectively recapitulates experimental peptide ensembles at a fraction of the running time of traditional approaches. PepFlow can additionally be used to sample conformations that satisfy constraints such as macrocyclization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.