2022
DOI: 10.1093/bib/bbac392
|View full text |Cite
|
Sign up to set email alerts
|

csORF-finder: an effective ensemble learning framework for accurate identification of multi-species coding short open reading frames

Abstract: Short open reading frames (sORFs) refer to the small nucleic fragments no longer than 303 nt in length that probably encode small peptides. To date, translatable sORFs have been found in both untranslated regions of messenger ribonucleic acids (RNAs; mRNAs) and long non-coding RNAs (lncRNAs), playing vital roles in a myriad of biological processes. As not all sORFs are translated or essentially translatable, it is important to develop a highly accurate computational tool for characterizing the coding potential… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(15 citation statements)
references
References 49 publications
0
14
0
1
Order By: Relevance
“…In addition to Arabidopsis thaliana and Fabaceae species, 23 experimentally validated miPEPs found in various plant species ,,− are also collected as positive samples of independent testing data set 3. As for negative samples, considering the coding ability of ncRNAs, a common pipeline is to extract sORF sequences of true ncRNAs and then translate them into pseudopeptide sequences. ,, The snRNA and snoRNA sequences are finally selected because no peptides encoded by these RNAs are discovered; i.e., they have no coding ability. Specifically, the corresponding sequences of various plant species are downloaded from the RNAcentral () database, and ORF Finder is then employed to extract pseudopeptide sequences.…”
Section: Materials and Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition to Arabidopsis thaliana and Fabaceae species, 23 experimentally validated miPEPs found in various plant species ,,− are also collected as positive samples of independent testing data set 3. As for negative samples, considering the coding ability of ncRNAs, a common pipeline is to extract sORF sequences of true ncRNAs and then translate them into pseudopeptide sequences. ,, The snRNA and snoRNA sequences are finally selected because no peptides encoded by these RNAs are discovered; i.e., they have no coding ability. Specifically, the corresponding sequences of various plant species are downloaded from the RNAcentral () database, and ORF Finder is then employed to extract pseudopeptide sequences.…”
Section: Materials and Methodsmentioning
confidence: 99%
“…On the one hand, predicting the coding potential of sORFs could be an indication of whether the target sORFs can encode small peptides or not. sORFPred and csORF-finder have been proven to be effective for predicting the coding sORFs. Additionally, unlike the above methods which aim to handle sORF sequences only, several methods have been devised to distinguish between coding and noncoding transcripts.…”
Section: Introductionmentioning
confidence: 99%
“…CPPred distinguishes coding RNAs and ncRNAs based on support vector machine (SVM) classifier and sequence features of RNA [ 24 ], while DeepCPP uses a conventional neural network (CNN) model [ 25 ]. The csORF-finder evaluated the coding potential of sORF in multiple species with in-frame sequence features [ 26 ]. Besides methods aimed at human and other mammals, the smORF prediction also goes ahead in plants, such as sORFPred [ 27 ], and in prokaryotes, such as PsORFs, sORFPredictor and ProsmORF-pred [ 28–30 ].…”
Section: Introductionmentioning
confidence: 99%
“…However, the subsequence embedding method ignores the sequential information at joins between subsequences. Inspired by the i-framed features mentioned in csORF-Finder, 32 we propose a novel approach called NOLTE to represent ncRNA sequences as lowdimensional dense vectors. The overview of generating embeddings is illustrated in Figure 2.…”
Section: ■ Introductionmentioning
confidence: 99%