2020
DOI: 10.1002/minf.202000033
|View full text |Cite
|
Sign up to set email alerts
|

Using Language Representation Learning Approach to Efficiently Identify Protein Complex Categories in Electron Transport Chain

Abstract: We herein proposed a novel approach based on the language representation learning method to categorize electron complex proteins into 5 types. The idea is stemmed from the the shared characteristics of human language and protein sequence language, thus advanced natural language processing techniques were used for extracting useful features. Specifically, we employed transfer learning and word embedding techniques to analyze electron complex sequences and create efficient feature sets before using a support vec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 35 publications
0
1
0
Order By: Relevance
“…While the use of evolutionary information in the form of positionspecific scoring matrix (PSSM) has improved prediction models in biological sequence analysis, the time it takes to generate a PSSM for each protein sequence presents limitations. To address this issue, researchers have explored the application of natural language processing (NLP) algorithms to the study of biological sequences [30][31][32].…”
Section: Using Word Embedding In Protein Sequencesmentioning
confidence: 99%
“…While the use of evolutionary information in the form of positionspecific scoring matrix (PSSM) has improved prediction models in biological sequence analysis, the time it takes to generate a PSSM for each protein sequence presents limitations. To address this issue, researchers have explored the application of natural language processing (NLP) algorithms to the study of biological sequences [30][31][32].…”
Section: Using Word Embedding In Protein Sequencesmentioning
confidence: 99%