2020
DOI: 10.21203/rs.3.rs-40744/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MULocDeep: A deep-learning framework for protein subcellular and suborganellar localization prediction with residue-level interpretation

Abstract: Prediction of protein localization plays an important role in understanding protein function and mechanism. In this paper, we propose a general deep learning-based localization prediction framework, MULocDeep, which can predict multiple localizations of a protein at both subcellular and suborganellar levels. We collected a dataset with 45 suborganellar localization annotations in 10 major subcellular compartments, the most comprehensive suborganelle localization dataset to date. We also experimentally generate… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(24 citation statements)
references
References 31 publications
0
17
0
Order By: Relevance
“…Nowadays, many bioinformatics methods for subcellular and sub-organelle localisation are easily findable and accessible [7,[9][10][11]. Moreover, the recent applications of machine learning (ML) and deep-learning (DL) approaches to encode protein sequences, has shown promising results in several tasks, including subcellular classification [12][13][14][15][16][17][18].…”
Section: Introductionmentioning
confidence: 99%
“…Nowadays, many bioinformatics methods for subcellular and sub-organelle localisation are easily findable and accessible [7,[9][10][11]. Moreover, the recent applications of machine learning (ML) and deep-learning (DL) approaches to encode protein sequences, has shown promising results in several tasks, including subcellular classification [12][13][14][15][16][17][18].…”
Section: Introductionmentioning
confidence: 99%
“…The current model only applies to sub-Golgi prediction. In the future, we will apply deep presentation learning features for eukaryotic proteins multiple subcellular and suborganellar localization prediction, functioning like DeepLoc ( Armenteros et al , 2017 ) or MULocDeep ( Jiang et al , 2020a ).…”
Section: Discussionmentioning
confidence: 99%
“…In 2019, Alley et al (2019) proposed a self-supervised and universal protein sequence deep representation learning tool, UniRep, which was trained using UniRef50 (a dataset with tens of millions of protein sequences) to better represent natural and de-novo designed proteins. Also, some other preprint papers such as TAPE ( Rao et al , 2019 ), BiLSTM embedding model ( Bepler et al , 2019 ), PRoBERTa ( Nambiar et al , 2020 ) and MULocDeep ( Jiang et al , 2020a ) have used similar ideas to encode protein sequences in a deep representations learning way and have obtained good results in many protein-sequence analysis applications.…”
Section: Introductionmentioning
confidence: 99%
“…BLOSUM62 is the default matrix for protein BLAST and is among the best for detecting weak protein similarities. Encoding with BLOSUM matrices is fast and provides a viable alternative if acquiring a PSSM is slow or unsuccessful [48] , [49] .…”
Section: Data and Featuresmentioning
confidence: 99%
“…The pooling operation reduces data dimension by combining the outputs of neuron clusters at one layer into a single neuron in the next layer. It is often desirable to apply CNNs to long protein sequences at the cost of losing single residue resolution for improved computational efficiency [48] , [49] , [123] . Moreover, CNN filters can be used to build position-weight matrices (PWMs) of sequence motifs, which can improve model interpretability [123] .…”
Section: Classification Algorithmsmentioning
confidence: 99%