2020
DOI: 10.1101/2020.08.04.236729
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Genome-wide Prediction of Small Molecule Binding to Remote Orphan Proteins Using Distilled Sequence Alignment Embedding

Abstract: Endogenous or surrogate ligands of a vast number of proteins remain unknown. Identification of small molecules that bind to these orphan proteins will not only shed new light into their biological functions but also provide new opportunities for drug discovery. Deep learning plays an increasing role in the prediction of chemical-protein interactions, but it faces several challenges in protein deorphanization. Bioassay data are highly biased to certain proteins, making it difficult to train a generalizable mach… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(7 citation statements)
references
References 38 publications
0
7
0
Order By: Relevance
“…To answer Q1, the same model architecture is trained on the same IID and OOD settings using four procedures: 1) from total scratch without any pretraining, i.e., Stage 3 only, 2) going through Stage 1 whole Pfam pretraining but not the Stage 2 binary DTI classification pretraining, which is equivalent to the DISAE model[2], 3) going through only Stage 2 but not Stage 1, and 4) complete three stage pretraining/fine-tuning as DeepREAL. As shown in Figure 2 on the evaluation cross three classes (no-binding, agonist, antagonist), the model without any pretraining (i.e., only Stage 3) has the worst performance.…”
Section: Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…To answer Q1, the same model architecture is trained on the same IID and OOD settings using four procedures: 1) from total scratch without any pretraining, i.e., Stage 3 only, 2) going through Stage 1 whole Pfam pretraining but not the Stage 2 binary DTI classification pretraining, which is equivalent to the DISAE model[2], 3) going through only Stage 2 but not Stage 1, and 4) complete three stage pretraining/fine-tuning as DeepREAL. As shown in Figure 2 on the evaluation cross three classes (no-binding, agonist, antagonist), the model without any pretraining (i.e., only Stage 3) has the worst performance.…”
Section: Resultsmentioning
confidence: 99%
“…There are four data sets involved: Pfam used to pretrain protein descriptor [17]; GLASS data [3] with a large GPCR protein-ligand binding binary data set; agonist/antagonist data are downloaded from the International Union of Basic and Clinical Pharmacology/British Pharmacological Society (IUPHAR/BPS) Guide to Pharmacology; Opioid receptor activity data from [19]. Protein descriptor pretraining exactly follows DISAE[2] hence is not explained in details here. In brief, DISAE builds up a distilled triplet sequence dictionary for the whole Pfam proteins based on multiple sequence alignments (MSA).…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations