2020
DOI: 10.1101/2020.05.11.088237
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TripletProt: Deep Representation Learning of Proteins based on Siamese Networks

Abstract: We introduce TripletProt, a new approach for protein representation learning based on the Siamese neural networks. We evaluate TripletProt comprehensively in protein functional annotation tasks including sub-cellular localization (14 categories) and gene ontology prediction (more than 2000 classes), which are both challenging multi-class multi-label classification machine learning problems. We compare the performance of TripletProt with the state-of-the-art approaches including recurrent language model-based a… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 26 publications
0
4
0
Order By: Relevance
“…Presumably, using a more biologically-motivated proxy task will yield better insights and performance on biological data. Some methods incorporate biological information such as protein-protein interactions (Nourani et al, 2020), or structured labels from SCOP (Bepler & Berger, 2019) and PDB (Gligorijevic et al, 2019); however, high-quality curation of these labels circle back to the need for expensive experiments.…”
Section: Introductionmentioning
confidence: 99%
“…Presumably, using a more biologically-motivated proxy task will yield better insights and performance on biological data. Some methods incorporate biological information such as protein-protein interactions (Nourani et al, 2020), or structured labels from SCOP (Bepler & Berger, 2019) and PDB (Gligorijevic et al, 2019); however, high-quality curation of these labels circle back to the need for expensive experiments.…”
Section: Introductionmentioning
confidence: 99%
“…Protein Language Models. There is growing interest in developing protein language models (pLMs) at the scale of evolution due to the abundance of 1D amino acid sequences, such as the series of ESM (Rives et al, 2019;Lin et al, 2022), TAPE (Rao et al, 2019), ProtTrans (Elnaggar et al, 2021), PRoBERTa (Nambiar et al, 2020), PMLM (He et al, 2021), ProteinLM (Xiao et al, 2021), PLUS (Min et al, 2021), Adversarial MLM (McDermott et al, 2021, ProteinBERT (Brandes et al, 2022), CARP (Yang et al, 2022a) in masked language modeling (MLM) fashion, ProtGPT2 in causal language modeling fashion, and several others (Melnyk et al, 2022a;Madani et al, 2021;Unsal et al, 2022;Nourani et al, 2021;Lu et al, 2020;Sturmfels et al, 2020;Strodthoff et al, 2020). These protein language models are able to generalize across a wide range of downstream applications and can capture evolutionary information about secondary and tertiary structures from sequences alone.…”
Section: G Related Workmentioning
confidence: 99%
“…We propose twin-network training of deep-learning models as a potential strategy to increase AC-sensitivity. Comparatively little work has been done to investigate twin neural network architectures (also referred to as Siamese networks [9,12,41,70]) in computational drug discovery [3,6,11,18,22,24,36,53,58,61,73,82]. However, twin networks provide a natural way to tackle chemical prediction problems on compound pairs such as AC-classification.…”
Section: Future Research: Exploring Twin-network Training Schemesmentioning
confidence: 99%