2018
DOI: 10.1186/s12859-018-2368-y
|View full text |Cite
|
Sign up to set email alerts
|

ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature

Abstract: BackgroundThe automated prediction of the enzymatic functions of uncharacterized proteins is a crucial topic in bioinformatics. Although several methods and tools have been proposed to classify enzymes, most of these studies are limited to specific functional classes and levels of the Enzyme Commission (EC) number hierarchy. Besides, most of the previous methods incorporated only a single input feature type, which limits the applicability to the wide functional space. Here, we proposed a novel enzymatic functi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
109
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 120 publications
(116 citation statements)
references
References 38 publications
0
109
0
Order By: Relevance
“…If not noted otherwise, CNN models are trained on representative sequences as this considerably reduces the computational burden for determining PSSM features and is in line with the literature, see e.g. [Li et al, 2018, Dalkiran et al, 2018, whereas UDSMProt is conventionally trained using the full training set including redundant sequences, whereas the corresponding test and validation sets always contain only non-redundant sequences. For the EC50 dataset non-redundant sequences enlarge the size of the training set from 45k to 114k and from 86k to 170k sequences for level 1 and level 0 respectively.…”
Section: Effect Of Similarity Threshold and Redundant Sequencesmentioning
confidence: 95%
See 4 more Smart Citations
“…If not noted otherwise, CNN models are trained on representative sequences as this considerably reduces the computational burden for determining PSSM features and is in line with the literature, see e.g. [Li et al, 2018, Dalkiran et al, 2018, whereas UDSMProt is conventionally trained using the full training set including redundant sequences, whereas the corresponding test and validation sets always contain only non-redundant sequences. For the EC50 dataset non-redundant sequences enlarge the size of the training set from 45k to 114k and from 86k to 170k sequences for level 1 and level 0 respectively.…”
Section: Effect Of Similarity Threshold and Redundant Sequencesmentioning
confidence: 95%
“…This process is repeated to a desired number of iterations. In our experiments we used the same parameters as reported in the literature [Li et al, 2018, Dalkiran et al, 2018, Shen and Chou, 2007, namely three iterations with e_value = 0.001, where e_value relates to the threshold for which an alignment is considered as significant. While the raw sequences from Swiss-Prot contained 26 unique amino acids (20 standard and 6 non-standard amino acids), PSSM features are computed only for the 20 standard amino acids.…”
Section: Baseline Modelmentioning
confidence: 99%
See 3 more Smart Citations