2015
DOI: 10.1080/07391102.2015.1014422
|View full text |Cite
|
Sign up to set email alerts
|

iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach

Abstract: A microRNA (miRNA) is a small non-coding RNA molecule, functioning in transcriptional and post-transcriptional regulation of gene expression. The human genome may encode over 1000 miRNAs. Albeit poorly characterized, miRNAs are widely deemed as important regulators of biological processes. Aberrant expression of miRNAs has been observed in many cancers and other disease states, indicating that they are deeply implicated with these diseases, particularly in carcinogenesis. Therefore, it is important for both ba… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
86
1

Year Published

2015
2015
2023
2023

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 130 publications
(88 citation statements)
references
References 81 publications
1
86
1
Order By: Relevance
“…Later on the concept of PseAAC was extended to cover all the feature vectors of proteins (Chou, 2009(Chou, , 2011. Furthermore, the concept of PseAAC has been extended to deal with DNA/RNA sequences Lin et al, 2014;Liu et al, 2015bLiu et al, , 2015c. Because it has been widely and increasingly used in many areas of computational biology, recently a web server called 'Pse-in One' was established to generate various modes of pseudocomponents (Liu et al, 2015e), which is the first web server ever that can generate nearly all the features of pseudocomponents of DNA, RNA, and protein sequences in one package.…”
Section: Introductionmentioning
confidence: 99%
“…Later on the concept of PseAAC was extended to cover all the feature vectors of proteins (Chou, 2009(Chou, , 2011. Furthermore, the concept of PseAAC has been extended to deal with DNA/RNA sequences Lin et al, 2014;Liu et al, 2015bLiu et al, , 2015c. Because it has been widely and increasingly used in many areas of computational biology, recently a web server called 'Pse-in One' was established to generate various modes of pseudocomponents (Liu et al, 2015e), which is the first web server ever that can generate nearly all the features of pseudocomponents of DNA, RNA, and protein sequences in one package.…”
Section: Introductionmentioning
confidence: 99%
“…Only the jackknife test is the least arbitrary that can always yield a unique result for a given benchmark dataset [54,55]. That is why researchers have a preference for the jackknife test for examining the quality of various machine learning based predictors such as [30,31,46]. Hence, we also use the jackknife test and independent dataset test to evaluate the accuracy of the current predictor in this work.…”
Section: Performance Evaluationmentioning
confidence: 99%
“…In order to compare with previous works, we use the benchmark dataset in the works of Liu et al [27,28,29,30] and Khan et al [31], which consists of positive samples (true pre-microRNAs) and negative samples (pseudo pre-microRNAs). As in the above works, we derived the positive samples from the miRBase (released on 20 June, 2013) [32], which is composed of 1872 experimentally confirmed premicroRNA sequences of homo sapiens.…”
Section: Datasetsmentioning
confidence: 99%
“…1) The occurrences of kmers, allowing at most m mismatches (Mismatch) [264,265,292] 2) The occurrences of kmers, allowing non-contiguous matches (Subsequence) [265,292,293] Autocorrelation 3) Moran autocorrelation (MAC) [217,294] 4) Geary autocorrelation (GAC) [217,295] 5) Normalized Moreau-Broto autocorrelation (NMBAC) [217,296] Predicted structure composition 6) Local structure-sequence triplet element (Triplet) [266] 7) Pseudo-structure status composition (PseSSC) [226] 8) Pseudo-distance structure status pair composition (PseDPC) [10] 2) PseAAC of Distance-Pairs and Reduced Alphabet (Distance Pair) [271] Autocorrelation 3) Physicochemical distance transformation (PDT) [270] Profile-based features 4) Select and combine the n most frequenct amino acids according to their frequencies (Top-n-gram) [269] 5) Profile-based Physicochemical distance transformation (PDT-Pofile) [270] 6) Distance-based Top-n-gram (DT) [271] 7) Profile-based Auto covariance (AC-PSSM) [272] 8) Profile-based Cross covariance (CC-PSSM) [272] 9) Profile-based Auto-cross covariance (ACC-PSSM) [272] Natural Science Mismatch [264] and Subsequence [265]; and 3 are added into the autocorrelation category, i.e., Moran autocorrelation, Geary autocorrelation, and Normalized Moreau-Broto autocorrelation [268]. PseAAC-General is designed to generate the feature vectors for protein sequences.…”
Section: Category Modementioning
confidence: 99%
“…This is because almost all the existing machine-learning algorithms, such as "Neural Network" or NN algorithm [1][2][3] "Support Vector Machine" or SVM algorithm [4][5][6][7][8][9][10][11][12] "Nearest Neighbor" or NN algorithm [13,14] and "Random Forest" algorithm [15][16][17][18][19][20][21][22] can only handle vectors but not sequence samples as elucidated in a review paper [23]. Unfortunately, if using the sequential model, i.e., the model in which all the samples are represented by their original sequences, it is hardly able to train a machine learning model that can cover all the possible cases concerned, as elaborated in [24].…”
Section: Introductionmentioning
confidence: 99%