2023
DOI: 10.1101/2023.10.13.562298
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Explainable protein function annotation using local structure embeddings

Alexander Derry,
Russ B. Altman

Abstract: The rapid expansion of protein sequence and structure databases has resulted in a significant number of proteins with ambiguous or unknown function. While advances in machine learning techniques hold great potential to fill this annotation gap, current methods for function prediction are unable to associate global function reliably to the specific residues responsible for that function. We address this issue by introducing PARSE (Protein Annotation by Residue-Specific Enrichment), a knowledge-based method whic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 64 publications
0
5
0
Order By: Relevance
“…Recently, there has been an explosion of publications reporting the use of pre-trained Protein Language Models (PLM) to predict protein functions 2933,78–80 ( Fig. 2 ).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, there has been an explosion of publications reporting the use of pre-trained Protein Language Models (PLM) to predict protein functions 2933,78–80 ( Fig. 2 ).…”
Section: Resultsmentioning
confidence: 99%
“…2 legend), models have been developed to link protein sequences to vocabularies such as GO terms 28,81,82 or to EC numbers 83 . Some models predicting EC numbers have integrated structural data to identify active site signature residues, improving the separation of non-isofunctional paralogous subgroups when these harbor experimentally validated members 17,29,84 . While EC prediction models outperform BLAST 78,80,83 , they are less accurate than the curated annotation databases such as SwissProt or KEGG 85 and not accurate enough to reliably annotate enzymes for many practical purposes, particularly when trying to reach the level of substrate specificity (or the fourth EC number 86 ).…”
Section: Resultsmentioning
confidence: 99%
“…The advent of AlphaFold2 and its continuous optimization have significantly improved this situation, allowing for effective analysis of enzyme product specificity through structural analysis. This approach has been extended to the large-scale annotation of enzyme functions [92,93], showcasing the advantages and potential of structural analysis in understanding PdiTPSs. Of course, our analysis has some disadvantages.…”
Section: Discussionmentioning
confidence: 99%
“…If these proteins could be accurately annotated, protein engineers would have access to a wealth of diverse candidates for engineering. While enzyme engineers have long been using multiple sequence alignments (MSAs) and homology to predict the functions of unannotated protein sequences, ML classification models extend these approaches and draw from more complete features describing protein sequences and structures to predict more specific functions, such as type of reactivity and k cat . , Focusing on known sequences without annotations, many of these methods aim to classify enzyme sequences based on their enzyme commission (EC) numbers, which is a hierarchical classification scheme that divides enzymes into general classes and then further subclasses, based on their catalytic activities (Figure A).…”
Section: Discovery Of Functional Enzymes With Machine Learningmentioning
confidence: 99%