2020
DOI: 10.1093/bioinformatics/btaa524
|View full text |Cite
|
Sign up to set email alerts
|

TransformerCPI: improving compound–protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments

Abstract: Motivation Identifying compound-protein interaction (CPI) is a crucial task in drug discovery and chemogenomics studies, and proteins without three-dimensional (3D) structure account for a large part of potential biological targets, which requires developing methods using only protein sequence information to predict CPI. However, sequence-based CPI models may face some specific pitfalls, including using inappropriate datasets, hidden ligand bias, and splitting datasets inappropriately, result… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
308
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 283 publications
(309 citation statements)
references
References 36 publications
1
308
0
Order By: Relevance
“…Recent studies show that models based on natural language processing inspired techniques such as Transformer, [217] BERT, [218] and GPT-2 [219] can learn features from a large corpus of protein sequences in a self-supervised fashion, with applications in a variety of downstream tasks. [220,221] Besides a linear sequence of amino acids, proteins can also be modeled as a graph to capture both structure and sequence information. Graph neural networks [222] are powerful deep learning architectures for learning representations of nodes and edges from such data.…”
Section: Discussionmentioning
confidence: 99%
“…Recent studies show that models based on natural language processing inspired techniques such as Transformer, [217] BERT, [218] and GPT-2 [219] can learn features from a large corpus of protein sequences in a self-supervised fashion, with applications in a variety of downstream tasks. [220,221] Besides a linear sequence of amino acids, proteins can also be modeled as a graph to capture both structure and sequence information. Graph neural networks [222] are powerful deep learning architectures for learning representations of nodes and edges from such data.…”
Section: Discussionmentioning
confidence: 99%
“…Its ROC-AUC and PR-AUC is 0.476 and 0.261, respectively. The Transformer-based architecture has achieved the state-of-the-art performance in the recently published work [16]. Indeed, it outperforms LSTM model with a ROC-AUC of 0.648 and a PR-AUC of 0.380, respectively.…”
Section: Disae Significantly Outperforms State-of-the-art Models For mentioning
confidence: 95%
“…End-to-end deep learning approaches have recently gained a momentum [10] [11]. Various neural network architectures, including Convolutional Neural Network (CNN) [12] [13], seq2seq [14] [15], and Transformer [16], have been applied to represent protein sequences. These works mainly focused on filling in missing CPIs for the existing drug targets.…”
Section: Introductionmentioning
confidence: 99%
“…End-to-end deep learning approaches have recently gained momentum. 10 11 Various neural network architectures, including Convolutional Neural Network (CNN), 12 13 seq2seq, 14 15 and Transformer, 16 have been applied to represent protein sequences. These works mainly focused on filling in missing CPIs for the existing drug targets.…”
Section: Introductionmentioning
confidence: 99%