2011
DOI: 10.1186/1471-2164-12-s4-s9
|View full text |Cite
|
Sign up to set email alerts
|

Improvement in the prediction of the translation initiation site through balancing methods, inclusion of acquired knowledge and addition of features to sequences of mRNA

Abstract: BackgroundThe accurate prediction of the initiation of translation in sequences of mRNA is an important activity for genome annotation. However, obtaining an accurate prediction is not always a simple task and can be modeled as a problem of classification between positive sequences (protein codifiers) and negative sequences (non-codifiers). The problem is highly imbalanced because each molecule of mRNA has a unique translation initiation site and various others that are not initiators. Therefore, this study fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 28 publications
0
4
0
Order By: Relevance
“…However, in real data sets imbalances ranging from 100:1 up to 10,000:1 have been reported [ 125 ]. This type of datasets are common in biological studies as well, for example, in the prediction of translation initiation sites [ 126 ] and pre-miRNA classification [ 127 ]. The dataset used in this work is imbalanced because the number of A-rich IRESs in S. cerevisiae used in the training of our SVM (9 sequences) is very small compared to all possible genes containing IRESs (nearly 100,000 for the 20 selected organisms).…”
Section: Methodsmentioning
confidence: 99%
“…However, in real data sets imbalances ranging from 100:1 up to 10,000:1 have been reported [ 125 ]. This type of datasets are common in biological studies as well, for example, in the prediction of translation initiation sites [ 126 ] and pre-miRNA classification [ 127 ]. The dataset used in this work is imbalanced because the number of A-rich IRESs in S. cerevisiae used in the training of our SVM (9 sequences) is very small compared to all possible genes containing IRESs (nearly 100,000 for the 20 selected organisms).…”
Section: Methodsmentioning
confidence: 99%
“…The size of the nucleotide sequence used in training has a direct influence on the quality of the prediction model (Silva et al, 2011;LIU and WONG, 2003). Extraction windows can be symmetric, with the same number of nucleotides in the upstream regions ( region of the sequence before TIS ) and downstream (region after of TIS), or asymmetric, with a number other than nucleotides for each region.…”
Section: Window Size Definitionmentioning
confidence: 99%
“…Extraction windows can be symmetric, with the same number of nucleotides in the upstream regions ( region of the sequence before TIS ) and downstream (region after of TIS), or asymmetric, with a number other than nucleotides for each region. Preliminary studies indicate that asymmetric-sized windows provide greater accuracy (Silva et al, 2011). We will adopt asymmetric windows in this work being the region upstream with the lowest number of nucleotides.…”
Section: Window Size Definitionmentioning
confidence: 99%
See 1 more Smart Citation