2018
DOI: 10.1038/s41598-018-36308-0
|View full text |Cite
|
Sign up to set email alerts
|

Image-based promoter prediction: a promoter prediction method based on evolutionarily generated patterns

Abstract: Prediction of promoter regions is crucial for studying gene function and regulation. The well-accepted position weight matrix method for this purpose relies on predefined motifs, which would hinder application across different species. Here, we introduce image-based promoter prediction (IBPP) as a method that creates an “image” from training promoter sequences using an evolutionary approach and predicts promoters by matching with the “image”. We used Escherichia coli σ70 promoter sequences to test the performa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
31
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 31 publications
(31 citation statements)
references
References 24 publications
(24 reference statements)
0
31
0
Order By: Relevance
“…Support vector machines are applied by Manavalan et al to predict phage virion proteins present in the bacterial genome [25]. Further examples of the application of support vector machines include the work of: Goel et al [12], who propose an improved method for splice site prediction in Eukaryotes; and, Wang et al [36], who introduce the detection of σ 70 promoters using evolutionary driven image creation.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Support vector machines are applied by Manavalan et al to predict phage virion proteins present in the bacterial genome [25]. Further examples of the application of support vector machines include the work of: Goel et al [12], who propose an improved method for splice site prediction in Eukaryotes; and, Wang et al [36], who introduce the detection of σ 70 promoters using evolutionary driven image creation.…”
Section: Related Workmentioning
confidence: 99%
“…Differences exist between the negative sets, as our method processes the full genome. This is in contrast to the use of a subsampled negative set by recent studies [29,24,36,23,34,6,29]. In the majority of studies, custom sampling methods are used, where the samples of the negative set is not publicly available [23,29,34,17,6].…”
Section: Benchmarkingmentioning
confidence: 99%
“…But, if promoter sequences in a training dataset are accurate, this would reduce the noise in the model and afford more accurate prediction. Hence, the computational challenge in applying machine learning to promoter strength prediction lies in the identification of small snippets of nucleotide sequence that strongly correlates with expression level [33]. Currently, a commonly used method for extracting sequence features is position weight matrix [34], but the approach may not be transferable to different species [33].…”
Section: Optimization Of Gene Expression Regulatory Elementsmentioning
confidence: 99%
“…Given its importance to the community, TSSs annotated with at least 'strong' evidence are retrieved from the RegulonDB database, summing to 6,487 annotated positions. RegulonDB features an upto-date collection of manual and automated TSSs collected from a plethora of independent sources [27], and has been the chosen data set for recent machine learning implementations [23,19,35,18].…”
Section: Datamentioning
confidence: 99%
“…As the nucleotide sequence downstream of the TSS is expected to be of relevance to its identification, labels can be shifted downstream in order to make the information of these regions accessible to the model. In this study, all labels have been shifted downstream by 20 nucleotides, a distance that is used by previous computational methods [23,19,35,18]. Studies on the mechanisms of RNAP binding do furthermore not describe interactions of the RNAP binding process beyond position +20 of the TSS [13,26].…”
Section: Model Architecturementioning
confidence: 99%