2023
DOI: 10.1101/2023.07.05.547769
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Complete Protein Representation by Deep Coupling of Sequence and Structure

Abstract: Learning effective representations is crucial for understanding proteins and their biological functions. Recent advancements in language models and graph neural networks have enabled protein models to leverage primary or tertiary structure information to learn representations. However, the lack of practical methods to deeply co-model the relationships between protein sequences and structures has led to suboptimal embeddings. In this work, we propose CoupleNet, a network that couples protein sequence and struct… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 43 publications
0
9
0
Order By: Relevance
“…64 The structures of active sites in unlabeled protein structures could be compared to existing structures to identify new, diverse sets of proteins with given function, using models trained on sequence and structure. 65 Structures could also be physically modeled to predict their interactions with different substrates. In principle, an ML model could be trained to combine multimodal information such as spatial descriptors of protein structures with an LLM trained on information about chemical reactions.…”
Section: Annotation Of Enzyme Activity Among Known Proteinsmentioning
confidence: 99%
See 1 more Smart Citation
“…64 The structures of active sites in unlabeled protein structures could be compared to existing structures to identify new, diverse sets of proteins with given function, using models trained on sequence and structure. 65 Structures could also be physically modeled to predict their interactions with different substrates. In principle, an ML model could be trained to combine multimodal information such as spatial descriptors of protein structures with an LLM trained on information about chemical reactions.…”
Section: Annotation Of Enzyme Activity Among Known Proteinsmentioning
confidence: 99%
“…Alternatively, many enzymes have common “modules,” or recurring residue arrangements, which perform similar reactions . The structures of active sites in unlabeled protein structures could be compared to existing structures to identify new, diverse sets of proteins with given function, using models trained on sequence and structure . Structures could also be physically modeled to predict their interactions with different substrates.…”
Section: Discovery Of Functional Enzymes With Machine Learningmentioning
confidence: 99%
“…Self-supervised learning was further enhanced by leveraging a pretrained protein language model to establish a connection between sequential information in the language model and structural information in the graph neural network. Furthermore, the introduction of Cou-pleNet [42] presented a network explicitly designed to seamlessly integrate protein sequence and structure, effectively generating informative representations of proteins. This network proficiently combined residue identities and positions from sequences with geometric features derived from tertiary structures.…”
Section: F Multimodal Representation Learningmentioning
confidence: 99%
“…Tang et al [62] focus on techniques for detecting text generated by LLMs, while Chang et al [29] have examined the various ways to evaluate LLMs. Additionally, there are a number of surveys dedicated to investigating the specialised applications of Large Models in various fields such as vision [23,24,32,33], education [34][35][36][37]63], healthcare [38,39], computational biology [42,43], computer programming [64,65], law [44][45][46]66], or robotics [47,67,68] among others. On the other hand, our survey stands apart in its exclusive focus on the applications of Large AI Models in the realm of audio signal processing, and fills an existing gap in the current body of research.…”
Section: Speechmentioning
confidence: 99%