2022
DOI: 10.1021/acsbiomaterials.1c01343
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Deep Learning Model to Predict and Design Secondary Structure Content of Structural Proteins

Abstract: Structural proteins are the basis of many biomaterials and key construction and functional components of all life. Further, it is well-known that the diversity of proteins’ function relies on their local structures derived from their primary amino acid sequences. Here, we report a deep learning model to predict the secondary structure content of proteins directly from primary sequences, with high computational efficiency. Understanding the secondary structure content of proteins is crucial to designing protein… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
21
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
6
2

Relationship

3
5

Authors

Journals

citations
Cited by 36 publications
(21 citation statements)
references
References 63 publications
0
21
0
Order By: Relevance
“…With the emergence of deep learning as a powerful tool to solve protein structure and mechanical function prediction problems, , one important research frontier is to develop end-to-end tools , that do not rely on evolutionary information (e.g., the use of Multiple Sequences Alignments, MSA) or even any structural information. For current end-to-end protein structure predictions, MSA, which is the foundation of many bioinformatic applications, plays a significant role in generating feature sets and determining the success of prediction tool. AlphaFold, for instance, which is trained with as many protein structures from the Protein Data Bank (PDB) as possible, has shown that its structure prediction accuracy is related to MSA depth (number of sequences in an MSA scheme).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…With the emergence of deep learning as a powerful tool to solve protein structure and mechanical function prediction problems, , one important research frontier is to develop end-to-end tools , that do not rely on evolutionary information (e.g., the use of Multiple Sequences Alignments, MSA) or even any structural information. For current end-to-end protein structure predictions, MSA, which is the foundation of many bioinformatic applications, plays a significant role in generating feature sets and determining the success of prediction tool. AlphaFold, for instance, which is trained with as many protein structures from the Protein Data Bank (PDB) as possible, has shown that its structure prediction accuracy is related to MSA depth (number of sequences in an MSA scheme).…”
Section: Resultsmentioning
confidence: 99%
“…This problem will be addressed in this paper, offering not only a tool to advance our understanding of the dynamics of proteins but also providing a toolkit for other applications where a need exists to expose atomistic-level simulation results toward a larger range of scales and domains. mechanical function prediction problems, 33,34 with an important frontier being the development of end-to-end tools 34,35 that do not rely on evolutionary information 36 (e.g., Multiple Sequences Alignments, MSA). In terms of specific algorithms, ML approaches such as feedforward neural networks (FFNNs), convolutional neural networks (CNNs), recurrent neural networks (RNNs), and long short-term memory (LSTM) are generally used for supervised learning, 37 while graph neural networks (GNNs) are widely used for semisupervised tasks.…”
Section: Introductionmentioning
confidence: 99%
“…Machine learning (ML) has emerged as a technique to uncover the underlying physicochemical behavior of biomolecules without specific fundamental chemical or physical laws as inputs. ML approaches, where the unknown scientific relationships are discovered through neural network models, often enable much faster and higher throughput computation of properties compared to traditional methods such as molecular dynamics, forming an alternative approach to conventional modeling or experimental analyses. Recently, our laboratories demonstrated the use of a deep learning method, using natural language processing (NLP) where computers understand the contexts of human language, on predicting the mechanical stability of collagen sequences and guessing the secondary structure of proteins. , Our earlier NLP model ColGen learned the behavior of a collagen data set consisting of 633 sequences and predicted the resulting melting temperatures ( T m ), a metric used to quantify the mechanical stability of collagen sequences …”
Section: Introductionmentioning
confidence: 99%
“…Recently, our laboratories demonstrated the use of a deep learning method, using natural language processing (NLP) where computers understand the contexts of human language, on predicting the mechanical stability of collagen sequences and guessing the secondary structure of proteins. 41,42 Our earlier NLP model ColGen learned the behavior of a collagen data set consisting of 633 sequences and predicted the resulting melting temperatures (T m ), a metric used to quantify the mechanical stability of collagen sequences. 41 In this earlier work, we predicted the T m values of a large range of collagen sequences to understand how mutations affect stability in a high-throughput manner (R 2 of 0.67 for the test set).…”
Section: Introductionmentioning
confidence: 99%
“…Here we discuss a perspective that focuses on a class of scientific machine learning methods referred to as transformers, at the frontier of complexity for which we do not yet have analytical methods at the root, or where such approaches are ineffective. Examples for such problems exist in the space of human language, , protein folding, molecular property prediction, or analyses of how complex nonlinear architected materials fail. , As visualized in Figure A,B these problems have in common that they can be described as the interaction of elementary building blocks (atoms, molecules, amino acids, peptides, words, musical notes, etc.) to form more integrated, complex structures with functions that ultimately far exceed those of individual building blocks, ,, defined by their web of interrelated functors …”
Section: Introductionmentioning
confidence: 99%