2019
DOI: 10.7287/peerj.preprints.27844v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TransPrise: a novel machine learning approach for eukaryotic promoter prediction

Abstract: As interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper we present TransPrise - an efficient deep learning tool for prediction of positions of eukaryotic transcription start sites. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
11
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(12 citation statements)
references
References 53 publications
1
11
0
Order By: Relevance
“…In contrast to previous studies [ 33 , 34 ], Table 3 shows that regardless of the usage of any additional feature, the performance of the CNN model could in general not be significantly improved. However, these results are in agreement with findings presented in [ 31 , 32 , 35 , 36 ] and indicate that the CNN architecture is able to learn specific patterns inherent in the sequences automatically. Hence, these patterns carry information which is obviously redundant to these widely used features.…”
Section: Resultssupporting
confidence: 92%
See 1 more Smart Citation
“…In contrast to previous studies [ 33 , 34 ], Table 3 shows that regardless of the usage of any additional feature, the performance of the CNN model could in general not be significantly improved. However, these results are in agreement with findings presented in [ 31 , 32 , 35 , 36 ] and indicate that the CNN architecture is able to learn specific patterns inherent in the sequences automatically. Hence, these patterns carry information which is obviously redundant to these widely used features.…”
Section: Resultssupporting
confidence: 92%
“…Until now, different machine learning approaches have been developed, which form the core of most computational prediction methods for promoter regions. Whereas in early works the emphasis was on the identification of specific promoter elements (such as TATA boxes, initiator elements (Inrs), downstream promoter elements (DPE) and others) or extraction of k-mer distributions [ 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 ], nowadays a more holistic approach is given preference in that whole genomic regions are examined in Convolutional Neural Networks (CNNs), which have been successfully applied in many species [ 31 , 32 , 33 , 34 , 35 , 36 ].…”
Section: Introductionmentioning
confidence: 99%
“…The two tools use the same methodology and differ in two aspects: (i) target organism [3PEAT was implemented for an ( Arabidopsis thaliana model) and TIPR for a ( Mus musculus model)]; and (ii) the classification of the predicted TSS sites of TSS–3PEAT (narrow, broad and weak peak) and TIPR (single and broad peak). Neural network is used by TSSPlant [ 4 ] and TransPrise [ 41 ]. TransPrise in particular uses convolutional neural networks to improve the prediction performance of the neural network-based model (TSSPlant).…”
Section: Introductionmentioning
confidence: 99%
“…Stacking several of these convolution layers together can lead to the detection of nested motifs at larger scales. Pioneering studies illustrated this ability of CNNs to reliably grasp complex combinations of DNA motifs and their relationship with functional regions of the genome [25,34,2,39,19,26].…”
Section: Introductionmentioning
confidence: 99%
“…This method performed very well and ranked above state-of-the-art support vector machine based methods. Similar tools were used in different contexts, aiming at identifying promoters [34,26] or detecting splice sites [24,17]. In these approaches, a sample set is first created by taking all positive class sequences (e.g.…”
Section: Introductionmentioning
confidence: 99%