2019
DOI: 10.1021/acs.jcim.9b00438
|View full text |Cite
|
Sign up to set email alerts
|

To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map

Abstract: Protein sequence profile prediction aims to generate multiple sequences from structural information to advance the protein design. Protein sequence profile can be computationally predicted by energy-based method or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2 has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one dimensional (1D) structural properties that are not sufficient to represent 3D structures. In this study, we represente… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
50
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 54 publications
(51 citation statements)
references
References 57 publications
1
50
0
Order By: Relevance
“…This last approach reaches state-of-the art prediction accuracy for a single residue given its environment, approximately 57% [56]. This metric should not be confused with the native sequence recovery rate, which compare the whole designed sequence with the native sequence and which is usually between 30 and 40% depending on the test set [49,54,63,65].…”
Section: Position-specific Scoring Matricesmentioning
confidence: 93%
See 2 more Smart Citations
“…This last approach reaches state-of-the art prediction accuracy for a single residue given its environment, approximately 57% [56]. This metric should not be confused with the native sequence recovery rate, which compare the whole designed sequence with the native sequence and which is usually between 30 and 40% depending on the test set [49,54,63,65].…”
Section: Position-specific Scoring Matricesmentioning
confidence: 93%
“…Neural networks designed to predict sequences often chose to output a tensor which can be directly interpreted as a Position-Specific Scoring Matrix [84]. In most cases, these scores are transformed to a probability distribution over amino acids using a "softmax" function at the final layer [50,[54][55][56][57][58]. Softmax combines exponentiation with normalization to transform arbitrary real vectors into vectors that can be interpreted as a multinoulli probability distribution.…”
Section: Position-specific Scoring Matricesmentioning
confidence: 99%
See 1 more Smart Citation
“…However, these methods are mostly based on LSTM and CNN and didn't utilize spatial information of protein molecules. Though our recent studies indicated that protein structure could be well represented by the residue-pairwise distance matrix through CNN (Chen, et al, 2020;Zheng, et al, 2020), the contacted structural information is only implicitly included that can't fully utilize the relations between contacted residues.…”
Section: Introductionmentioning
confidence: 99%
“…Chen and coworkers developed the SPROF method, which uses the two-dimensional map of the pairwise residue distance as the input, and reached an accuracy of 39.8%, representing a 5.2% improvement over that of SPIN2. 41 Yu et al used an interesting approach that translates amino acid sequences into musical compositions and trained a recurrent neural network to generate protein sequences. 42 Greener et al used a variational autoencoder to generate protein sequences conditioned on protein structures.…”
Section: Introductionmentioning
confidence: 99%