Random KNN feature selection - a fast and stable alternative to Random Forests

Li, Shengqiao; Harner, E. James; Adjeroh, Donald A.

doi:10.1186/1471-2105-12-450

Cited by 106 publications

(66 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In Drzewiecki (2016b) nine machine learning (ML) regression algorithms were tested: Cubist (Quinlan, 1993), Random Forest (RF) (Breiman, 2001), stochastic gradient boosting of regression trees (GBM) (Friedman, 2002), k-nearest neighbors (kNN), random k-nearest neighbors (rkNN) (Li et al, 2011), Multivariate Adaptive Regression Splines (MARS) (Friedman, 1991), averaged neural networks (avNN) (Ripley, 1996), support vector machines (Smola and Schölkopf, 2004) with polynomial (SVMp) and radial (SVMr) kernels. For every study area, each of them was used to predict imperviousness for both mid 1990s and late 2000s.…”

Section: Detection Of Relevant Changesmentioning

confidence: 99%

Thorough statistical comparison of machine learning regression models and their ensembles for sub-pixel imperviousness and imperviousness change mapping

Drzewiecki¹

2017

Geodesy and Cartography

View full text Add to dashboard Cite

Abstract:We evaluated the performance of nine machine learning regression algorithms and their ensembles for sub-pixel estimation of impervious areas coverages from Landsat imagery. The accuracy of imperviousness mapping in individual time points was assessed based on RMSE, MAE and R 2 . These measures were also used for the assessment of imperviousness change intensity estimations. The applicability for detection of relevant changes in impervious areas coverages at sub-pixel level was evaluated using overall accuracy, F-measure and ROC Area Under Curve. The results proved that Cubist algorithm may be advised for Landsat-based mapping of imperviousness for single dates. Stochastic gradient boosting of regression trees (GBM) may be also considered for this purpose. However, Random Forest algorithm is endorsed for both imperviousness change detection and mapping of its intensity. In all applications the heterogeneous model ensembles performed at least as well as the best individual models or better. They may be recommended for improving the quality of sub-pixel imperviousness and imperviousness change mapping. The study revealed also limitations of the investigated methodology for detection of subtle changes of imperviousness inside the pixel. None of the tested approaches was able to reliably classify changed and non-changed pixels if the relevant change threshold was set as one or three percent. Also for fi ve percent change threshold most of algorithms did not ensure that the accuracy of change map is higher than the accuracy of random classifi er. For the threshold of relevant change set as ten percent all approaches performed satisfactory.

show abstract

Section: Detection Of Relevant Changesmentioning

confidence: 99%

Thorough statistical comparison of machine learning regression models and their ensembles for sub-pixel imperviousness and imperviousness change mapping

Drzewiecki¹

2017

Geodesy and Cartography

View full text Add to dashboard Cite

show abstract

“…Random forests (RF) is one of the most important supervised methods for feature gene selection (16)(17)(18). During the classifying process, RF returns several measures of vari-able importance.…”

Section: Methodsmentioning

confidence: 99%

Identification of recurrence-related genes by integrating microRNA and gene expression profiling of gastric cancer

Yan

Xiong

et al. 2012

International Journal of Oncology

View full text Add to dashboard Cite

Abstract. We previously analyzed the microRNA (miRNA) expression pattern in gastric cancer with and without recurrence and obtained 17 differentially expressed miRNAs with potential to predict recurrence risk for GC patients. In the present study, we aimed to investigate recurrence-related genes which may be regulated by the differentially expressed miRNAs identified in our prior research. Three different miRNA target gene databases (miRanda, TargetScan and PicTar) were used for searching the potential genes regulated by miRNAs. A combination was performed between miRNA target genes and recurrence-related gene expression profiling. Three bioinformatics algorithms (PAM, SVM and RF) were used to feature recurrence-related gene selection. In addition, we validated the expression levels of the genes in GC patients using real-time PCR. A total of 3,263 genes were identified as potential targets of 17 miRNAs. We identified 2,736 differential expressed genes using the SAM method based on 22K oligo microarray data which included 7 recurrence and 4 without recurrence GC samples. Combining the target genes regulated by miRNAs and the differentially expressed genes between recurrence and non-recurrence groups, we identified 228 differential genes for further study. Finally, we identified HNRPA0 and PRDM4 as risk biomarkers of GC patients, which were regulated by hsa-miR-194 and hsa-miR-373, respectively. Our data indicated that HNRPA0 and PRDM4 may be involved in the recurrence process of GC and have potential to act as new prognostic biomarkers in predicting recurrence risk for gastric cancer patients.

show abstract

“…With regard to classification model selection, different algorithms have been studied for the identification of differentially expressed genes in genomic data. Classification methods such as Multilayer Perceptron (NN) [23], [24], [15], Support Vector Machines (SVM) [25], Naive Bayes (NB) [26], k-Nearest Neighbour (kNN) [27], Decision Trees (DT) [28], and RF (Random Forest) [29] have been used in recent studies. Finally, prediction assessment refers to the performance of the predictive models.…”

Section: Introductionmentioning

confidence: 99%

Machine learning models to search relevant genetic signatures in clinical context

Urda

Luque‐Baena

Franco

et al. 2017

2017 International Joint Conference on Neural Networks (IJCNN)

View full text Add to dashboard Cite

Abstract-Clinicians are interested in the estimation of robust and relevant genetic signatures from gene sequencing data. Many machine learning approaches have been proposed trying to address well-known issues of this complex task (feature or gene selection, classification or model selection, and prediction assessment). Addressing this problem often requires a deep knowledge of these methods and some of them demand high computational resources that may not be affordable. In this paper, an exhaustive study that includes different types of feature selection methods and classifiers is presented, providing clinicians an useful insight of the most suitable methods for this purpose. Predictions assessment is performed using a bootstrap crossvalidation strategy as an honest validation scheme. The results of this study for six benchmark datasets show that filter or embedded methods are preferred, in general, to wrapper methods according to their better statistical significant results, in terms of accuracy, and lower demand for computational resources.

show abstract

Random KNN feature selection - a fast and stable alternative to Random Forests

Cited by 106 publications

References 32 publications

Thorough statistical comparison of machine learning regression models and their ensembles for sub-pixel imperviousness and imperviousness change mapping

Thorough statistical comparison of machine learning regression models and their ensembles for sub-pixel imperviousness and imperviousness change mapping

Identification of recurrence-related genes by integrating microRNA and gene expression profiling of gastric cancer

Machine learning models to search relevant genetic signatures in clinical context

Contact Info

Product

Resources

About