MicroRNA categorization using sequence motifs and k-mers

Yousef, Malik; Khalifa, Waleed; Acar, İlhan E.; Allmer, Jens

doi:10.1186/s12859-017-1584-1

Cited by 29 publications

(31 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, only sequence-based features were used for parameterization in this study. Sequence motifs (200) as in [31] were used as well as 84 k-mers and their information-theoretic transformations (91). In the following, the parameters used in this study are detailed.…”

Section: Parameterization Of Pre-mirnasmentioning

confidence: 99%

“…All hairpins were filtered for sequence similarity as in Yousef et al [31] before training machine learning models using the Usearch tool [47].…”

Section: Datasetsmentioning

confidence: 99%

“…Therefore, both positive and negative classes for training and testing were derived from known pre-miRNAs, effectively removing the need for pseudo negative data. We have previously proposed the same strategy [31] using sequence motifs and k-mers. In this study, we further introduced information-theoretic approaches and important additional analyses.…”

Section: Introductionmentioning

confidence: 99%

“…However, we realized that using positive examples to represent the negative class from different species holds a number of promises [31]. One of the promises is that it enables the categorization of pre-miRNAs into species.…”

Section: Introductionmentioning

confidence: 99%

“…Due to the large impact of k-mers on the categorization in our previous work, in an attempt to add more discriminating power, here, we added information theory (IT)-based features. Apart from our previous study [31], only Lopes and colleagues attempted to use pre-miRNAs to discriminate between species [33]. However, they resorted to establishing ab initio pre-miRNA detection models with the same bias on negative data as existing pre-miRNA detection methods [26,[39][40][41][42]; using the same training and testing strategies [32,[42][43][44].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Categorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mers

Yousef

Nigatu

Levy

et al. 2017

EURASIP J. Adv. Signal Process.

Self Cite

View full text Add to dashboard Cite

Background: Diseases like cancer can manifest themselves through changes in protein abundance, and microRNAs (miRNAs) play a key role in the modulation of protein quantity. MicroRNAs are used throughout all kingdoms and have been shown to be exploited by viruses to modulate their host environment. Since the experimental detection of miRNAs is difficult, computational methods have been developed. Many such tools employ machine learning for pre-miRNA detection, and many features for miRNA parameterization have been proposed. To train machine learning models, negative data is of importance yet hard to come by; therefore, we recently started to employ pre-miRNAs from one species as positive data versus another species' pre-miRNAs as negative examples based on sequence motifs and k-mers. Here, we introduce the additional usage of information-theoretic (IT) features. Results: Pre-miRNAs from one species were used as positive and another species' pre-miRNAs as negative training data for machine learning. The categorization capability of IT and k-mer features was investigated. Both feature sets and their combinations yielded a very high accuracy, which is as good as the previously suggested sequence motif and k-mer based method. However, for obtaining a high performance, a sufficiently large phylogenetic distance between the species and sufficiently high number of pre-miRNAs in the training set is required. To examine the contribution of the IT and k-mer features, an information gain-based feature ranking was performed. Although the top 3 are IT features, 80% of the top 100 features are k-mers. The comparison of all three individual approaches (motifs, IT, and k-mers) shows that the distinction of species based on their pre-miRNAs k-mers are sufficient. Conclusions: IT sequence feature extraction enables the distinction among species and is less computationally expensive than motif calculations. However, since IT features need larger amounts of data to have enough statistics for producing highly accurate results, future categorization into species can be effectively done using k-mers only. The biological reasoning for this is the existence of a codon bias between species which can, at least, be observed in exonic miRNAs. Future work in this direction will be the ab initio detection of pre-miRNA. In addition, prediction of pre-miRNA from RNA-seq can be done.

show abstract

Section: Parameterization Of Pre-mirnasmentioning

confidence: 99%

“…All hairpins were filtered for sequence similarity as in Yousef et al [31] before training machine learning models using the Usearch tool [47].…”

Section: Datasetsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Categorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mers

Yousef

Nigatu

Levy

et al. 2017

EURASIP J. Adv. Signal Process.

Self Cite

View full text Add to dashboard Cite

show abstract

Ensemble Classifiers for Multiclass MicroRNA Classification

Odenthal

Allmer

Yousef

2021

Methods in Molecular Biology

View full text Add to dashboard Cite

44 Current Challenges in miRNomics

Akgül

Stadler

Hawkins

et al. 2021

Methods in Molecular Biology

Self Cite

View full text Add to dashboard Cite

Mature microRNAs (miRNAs) are short RNA sequences about 18-24 nucleotide long, which provide the recognition key within RISC for the posttranscriptional regulation of target RNAs. Considering the canonical pathway, mature miRNAs are produced via a multistep process. Their transcription (pri-miRNAs) and first processing step via the microprocessor complex (pre-miRNAs) occur in the nucleus. Then they are exported into the cytosol, processed again by Dicer (dsRNA) and finally a single strand (mature miRNA) is incorporated into RISC (miRISC). The sequence of the incorporated miRNA provides the function of RNA target recognition via hybridization. Following binding of the target, the mRNA is either degraded or translation is inhibited, which ultimately leads to less protein production. Conversely, it has been shown that binding within the 5 0 UTR of the mRNA can lead to an increase in protein product. Regulation of homeostasis is very important for a cell; therefore, all steps in the miRNA-based regulation pathway, from transcription to the incorporation of the mature miRNA into RISC, are under tight control. While much research effort has been exerted in this area, the knowledgebase is not sufficient for accurately modelling miRNA regulation computationally. The computational prediction of miRNAs is, however, necessary because it is not feasible to investigate all possible pairs of a miRNA and its target, let alone miRNAs and their targets. We here point out open challenges important for computational modelling or for our general understanding of miRNA-based regulation and show how their investigation is beneficial. It is our hope that this collection of challenges will lead to their resolution in the near future.

show abstract

MicroRNA categorization using sequence motifs and k-mers

Cited by 29 publications

References 36 publications

Categorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mers

Categorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mers

Ensemble Classifiers for Multiclass MicroRNA Classification

44 Current Challenges in miRNomics

Contact Info

Product

Resources

About