2014
DOI: 10.1093/nar/gku1019
|View full text |Cite
|
Sign up to set email alerts
|

iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition

Abstract: The σ54 promoters are unique in prokaryotic genome and responsible for transcripting carbon and nitrogen-related genes. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapidly and effectively identifying the σ54 promoters. Here, a predictor called ‘iPro54-PseKNC’ was developed. In the predictor, the samples of DNA sequences were formulated by a novel feature vector called ‘pseudo k-tuple nucleotide composition’, which was further op… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
250
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
9

Relationship

2
7

Authors

Journals

citations
Cited by 474 publications
(250 citation statements)
references
References 95 publications
0
250
0
Order By: Relevance
“…In literature, a set of metrics are often used to measure the prediction quality. To make it intuitive and easy to understand for readers, here we adopt the definition and notations used in [40,41,[56][57][58][59][60] to describe the corresponding evaluation metrics: N + . It should be pointed out that the set of metrics above is valid only for the single-label system (such as the case at hand).…”
Section: Experiments Ii: Identification Of Dna-binding Proteinsmentioning
confidence: 99%
“…In literature, a set of metrics are often used to measure the prediction quality. To make it intuitive and easy to understand for readers, here we adopt the definition and notations used in [40,41,[56][57][58][59][60] to describe the corresponding evaluation metrics: N + . It should be pointed out that the set of metrics above is valid only for the single-label system (such as the case at hand).…”
Section: Experiments Ii: Identification Of Dna-binding Proteinsmentioning
confidence: 99%
“…As demonstrated by a series of recent publications (Chou 2011;Chen et al 2014;Ding et al 2014;Lin et al 2014;Xu et al 2014;Liu et al 2015) in response to the call (Chou 2011) to establish a really useful sequencebased statistical predictor for a biological system, we need to consider the following procedures: (a) construct or select a valid benchmark dataset to train and test the predictor; (b) formulate the biological sequence samples with an effective mathematical expression that can truly reflect their intrinsic correlation with the target to be predicted; (c) introduce or develop a powerful algorithm (or engine) to operate the prediction; (d) properly perform cross-validation tests to objectively evaluate the anticipated accuracy of the predictor; (e) establish a userfriendly web server for the predictor that is accessible to the public. Below, let us describe how to address these steps one by one.…”
Section: Introductionmentioning
confidence: 94%
“…"Distance Pair" method incorporates the amino acid distance pair coupling information and the amino acid reduced alphabet profile into the general pseudo amino acid composition (PseAAC) [108] vector, which is very useful for analysing DNA-binding proteins [15,170,189,275]. PDT is the abbreviation for "physicochemical distance transformation", which can incorporate considerable sequence-order information or important patterns of protein/peptide sequences into Pseudo components [28], which is very useful for conducting various proteome analyses [17, 23, 215-217, 224, 225, 231, 235, 276-289] and genome analysis as well [216,218,220,223,229,255,277,290].…”
Section: Category Modementioning
confidence: 99%