2017
DOI: 10.1093/bioinformatics/btx680
|View full text |Cite
|
Sign up to set email alerts
|

DEEPre: sequence-based enzyme EC number prediction by deep learning

Abstract: MotivationAnnotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number.Resu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
219
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 232 publications
(222 citation statements)
references
References 77 publications
3
219
0
Order By: Relevance
“…There are many successful deep learning applications in protein bioinformatics, [53][54][55] including applications in modeling protein sequences to predict hierarchical labels 56 There are several innovations in this work.…”
Section: Discussionmentioning
confidence: 99%
“…There are many successful deep learning applications in protein bioinformatics, [53][54][55] including applications in modeling protein sequences to predict hierarchical labels 56 There are several innovations in this work.…”
Section: Discussionmentioning
confidence: 99%
“…If not noted otherwise, CNN models are trained on representative sequences as this considerably reduces the computational burden for determining PSSM features and is in line with the literature, see e.g. [Li et al, 2018, Dalkiran et al, 2018, whereas UDSMProt is conventionally trained using the full training set including redundant sequences, whereas the corresponding test and validation sets always contain only non-redundant sequences. For the EC50 dataset non-redundant sequences enlarge the size of the training set from 45k to 114k and from 86k to 170k sequences for level 1 and level 0 respectively.…”
Section: Effect Of Similarity Threshold and Redundant Sequencesmentioning
confidence: 95%
“…This process is repeated to a desired number of iterations. In our experiments we used the same parameters as reported in the literature [Li et al, 2018, Dalkiran et al, 2018, Shen and Chou, 2007, namely three iterations with e_value = 0.001, where e_value relates to the threshold for which an alignment is considered as significant. While the raw sequences from Swiss-Prot contained 26 unique amino acids (20 standard and 6 non-standard amino acids), PSSM features are computed only for the 20 standard amino acids.…”
Section: Baseline Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…The first step in synthetic pathway design is the identification of relevant genes in existing biosynthetic pathways. Increasingly, sophisticated structure–function prediction algorithms (Li et al, ) and metabolic models (Heavner & Price, ) are used to “bioprospect” in sequence databases to find genes that might be used to construct a desired pathway (Figure ). Once an appropriate set of genes is assembled, transcriptional networks are reconstructed, rewired, or even created de novo to precisely control the expression of the genes and engineer a genetic circuit to produce a specific product (Figure ).…”
Section: Synthetic Biology Approaches To Create Whole Pathway and Regmentioning
confidence: 99%