2020
DOI: 10.1093/bioinformatics/btaa609
|View full text |Cite
|
Sign up to set email alerts
|

iPromoter-BnCNN: a novel branched CNN-based predictor for identifying and classifying sigma promoters

Abstract: Abstract Motivation Promoter is a short region of DNA which is responsible for initiating transcription of specific genes. Development of computational tools for automatic identification of promoters is in high demand. According to the difference of functions, promoters can be of different types. Promoters may have both intra and inter class variation and similarity in terms of consensus sequ… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
39
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 42 publications
(39 citation statements)
references
References 45 publications
0
39
0
Order By: Relevance
“…To do this, we obtained Promotech’s predictions on a balanced data set with 2,860 experimentally validated E. coli promoters collected from RegulonDB [26]. This data set has been used to evaluate the performance of several E. coli promoter prediction tools [10, 12, 27]. The average 5-fold cross-validation MCC and accuracy reported on this data set [10, 12, 27] are in the range of [0.498, 0.763] and [0.748, 0.882], respectively.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…To do this, we obtained Promotech’s predictions on a balanced data set with 2,860 experimentally validated E. coli promoters collected from RegulonDB [26]. This data set has been used to evaluate the performance of several E. coli promoter prediction tools [10, 12, 27]. The average 5-fold cross-validation MCC and accuracy reported on this data set [10, 12, 27] are in the range of [0.498, 0.763] and [0.748, 0.882], respectively.…”
Section: Resultsmentioning
confidence: 99%
“…This data set has been used to evaluate the performance of several E. coli promoter prediction tools [10, 12, 27]. The average 5-fold cross-validation MCC and accuracy reported on this data set [10, 12, 27] are in the range of [0.498, 0.763] and [0.748, 0.882], respectively. RF-HOT achieved on this data set a MCC of 0.54, accuracy of 0.77, AUPRC of 0.845 and AUROC of 0.84; while, RF-TETRA achieved a MCC of 0.47, accuracy of 0.734, AUPRC of 0.830 and AUROC of 0.808.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The dataset used to train the model is imbalanced, so different techniques, such as Synthetic Minority Oversampling Technique (SMOTE), were used to overcome the problem; however, SMOTE can easily turn the model toward data overfitting. Being inspired from [21] we proposed the use of a cascading binary classifier. The problem identified in the proposed architecture of [21] was that it uses four different encoding schemes and a large number of convolutional filters, which eventually increases both computational cost and complexity.…”
Section: Model Setupmentioning
confidence: 99%
“…MULTiPly first identifies the best combination of information features by using an F-score feature selection method; that step is followed by applying five binary classifiers to identify the promoter class. A model named iPromoter-BnCNN proposed by Amin et al utilized four parallel one-dimensional convolutional filters applied to the monomer nucleotide sequence, the trimer nucleotide sequence, and the structural belonging dimers and trimers of the DNA sequence [21]. The dense layer combined all of the extracted features and performed the classification task.…”
Section: Introductionmentioning
confidence: 99%