2017
DOI: 10.1186/s12859-017-1594-z
|View full text |Cite
|
Sign up to set email alerts
|

Identification of long non-coding transcripts with feature selection: a comparative study

Abstract: BackgroundThe unveiling of long non-coding RNAs as important gene regulators in many biological contexts has increased the demand for efficient and robust computational methods to identify novel long non-coding RNAs from transcripts assembled with high throughput RNA-seq data. Several classes of sequence-based features have been proposed to distinguish between coding and non-coding transcripts. Among them, open reading frame, conservation scores, nucleotide arrangements, and RNA secondary structure have been u… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
1
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(13 citation statements)
references
References 68 publications
0
13
0
Order By: Relevance
“…Each ChIP-seq sample is described by a comprehensive set of 245 quantitative features (QC-metrics), including LM, EM and GM. To remove uninformative features before training the machine learning models, we applied a feature selection procedure similar to [33] on each set of QC-metrics separately. After standardizing each QC quantitative feature (subtracting mean and dividing by standard deviation) for EM, GM and LM separately, we applied hierarchical clustering ("complete linkage") based on the distance measure !…”
Section: Feature Correlation and Principal Component Analysismentioning
confidence: 99%
“…Each ChIP-seq sample is described by a comprehensive set of 245 quantitative features (QC-metrics), including LM, EM and GM. To remove uninformative features before training the machine learning models, we applied a feature selection procedure similar to [33] on each set of QC-metrics separately. After standardizing each QC quantitative feature (subtracting mean and dividing by standard deviation) for EM, GM and LM separately, we applied hierarchical clustering ("complete linkage") based on the distance measure !…”
Section: Feature Correlation and Principal Component Analysismentioning
confidence: 99%
“…Long non-coding RNAs (lncRNAs) are gaining attention because of critical biological functions suggested by recent studies (for a review see (Mercer, Dinger & Mattick, 2009)). Some of the ML applications found included the detection of cancer-related lncRNA (Zhang et al, 2018), the discrimination of circular RNAs from other lncRNAs (Chen et al, 2018), and selection of the most informative features of lncRNA (Ventola et al, 2017). Other applications in the RNA field address the identification and clustering of RNA structure motifs (Smith et al, 2017),…”
Section: Architectures and Algorithms Currently Used For Tes Or Simentioning
confidence: 99%
“…PRISMA flow diagram Publication identifier Year Q1 Q2 Q3 Q4 Reference Publication identifier Year Q1 Q2 Q3 Q4 Reference P1 X X X X (Yu, Yu & Pan, 2017) P19 2013 X X X (Loureiro et al, 2013b) P2 X X X (Schietgat et al, 2018) P20 2014 X X (Ma, Zhang & Wang, 2014) P3 X X X (Arango-López et al, 2017) P21 2010 X X (Dashti & Masoudi-Nejad, 2010) P4 X X X (Loureiro et al, 2013a) P22 2010 X X (Ding, Zhou & Guan, 2010) P5 X X X (Tsafnat et al, 2011) P23 2019 X (Jaiswal & Krishnamachari, 2019) P6 X X X (Zhang et al, 2018) P24 2015 X X X X (Girgis, 2015) P7 X X X (Eraslan et al, 2019) P25 2018 X X X (Nakano et al, 2018a) P8 X X X (Douville et al, 2018) P26 2018 X X X (Zamith Santos et al, 2018) P9 X X (Chen et al, 2018) P27 2009 X (Abrusan et al, 2009) P10 X X X X (Ashlock & Datta, 2012) P28 2019 X X (Su, Gu & Peterson, 2019) P11 X X X (Smith et al, 2017) P29 2017 X X X X (Nakano et al, 2017) P12 X X X X (Kamath, De Jong & Shehu, 2014) P30 2014 X X X (Brayet et al, 2014) P13 X X X (Kim et al, 2016) P31 2013 X (Zamani et al, 2013) P14 X X X (Segal et al, 2018) P32 2019 X (Hubbard et al, 2019) P15 X X X (Rawal & Ramaswamy, 2011) P33 2014 X X (Ryvkin et al, 2014) P16 X X X (Tang et al, 2017) P34 2013 X X X X (Zhang et al, 2013) P17 X X X (Ventola et al, 2017) P35 2019 X X…”
Section: Figurementioning
confidence: 99%
See 1 more Smart Citation
“…TEs have been found in all organisms and comprise the majority of the nuclear DNA content of plant genomes (Orozco-Arias et al, 2018), such as in wheat, barley and maize. In these species, up to 85% of the sequenced DNA is classified into repeated sequences (Choulet et al, 2014), of which TEs represent the most abundant and functionally relevant type (Ventola et al, 2017). Due to the high diversity of TE structures and transposition mechanisms, there are still numerous classification problems and debates on the classification systems (Piégu et al, 2015).…”
Section: Introductionmentioning
confidence: 99%