Feature Subset Selection Problem using Wrapper Approach in Supervised Learning

Karegowda, Asha Gowda; Manjunath, A. S.; Jayaram, M. A.

doi:10.5120/169-295

Cited by 121 publications

(76 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We used the wrapper approach[40] where all subsets of features are evaluated using a given machine learning approach. The models resulting from the wrapper with each machine learning algorithm i were i = 1 , 2 ,…2 n , where n is the number of predictive factors.…”

Section: Methodsmentioning

confidence: 99%

Modelling Predictors of Molecular Response to Frontline Imatinib for Patients with Chronic Myeloid Leukaemia

et al. 2017

View full text Add to dashboard Cite

BackgroundTreatment of patients with chronic myeloid leukaemia (CML) has become increasingly difficult in recent years due to the variety of treatment options available and challenge deciding on the most appropriate treatment strategy for an individual patient. To facilitate the treatment strategy decision, disease assessment should involve molecular response to initial treatment for an individual patient. Patients predicted not to achieve major molecular response (MMR) at 24 months to frontline imatinib may be better treated with alternative frontline therapies, such as nilotinib or dasatinib. The aims of this study were to i) understand the clinical prediction ‘rules’ for predicting MMR at 24 months for CML patients treated with imatinib using clinical, molecular, and cell count observations (predictive factors collected at diagnosis and categorised based on available knowledge) and ii) develop a predictive model for CML treatment management. This predictive model was developed, based on CML patients undergoing imatinib therapy enrolled in the TIDEL II clinical trial with an experimentally identified achieving MMR group and non-achieving MMR group, by addressing the challenge as a machine learning problem. The recommended model was validated externally using an independent data set from King Faisal Specialist Hospital and Research Centre, Saudi Arabia.Principle FindingsThe common prognostic scores yielded similar sensitivity performance in testing and validation datasets and are therefore good predictors of the positive group. The G-mean and F-score values in our models outperformed the common prognostic scores in testing and validation datasets and are therefore good predictors for both the positive and negative groups. Furthermore, a high PPV above 65% indicated that our models are appropriate for making decisions at diagnosis and pre-therapy. Study limitations include that prior knowledge may change based on varying expert opinions; hence, representing the category boundaries of each predictive factor could dramatically change performance of the models.

show abstract

Section: Methodsmentioning

confidence: 99%

Modelling Predictors of Molecular Response to Frontline Imatinib for Patients with Chronic Myeloid Leukaemia

et al. 2017

View full text Add to dashboard Cite

show abstract

“…The number of features (attributes) and number of instances in the raw dataset can be enormously large. Feature selection must be conducted to identify and remove irrelevant features [9]. Feature selection aims to maximize classification accuracy.…”

Section: Feature Selection and Ensemble Methodsmentioning

confidence: 99%

A Feature Selection-based Ensemble Method for Arrhythmia Classification

Namsrai¹,

Munkhdalai²,

Li³

et al. 2013

Journal of Information Processing Systems

View full text Add to dashboard Cite

Abstract-In this paper, a novel method is proposed to build an ensemble of classifiers by using a feature selection schema. The feature selection schema identifies the best feature sets that affect the arrhythmia classification. Firstly, a number of feature subsets are extracted by applying the feature selection schema to the original dataset. Then classification models are built by using the each feature subset. Finally, we combine the classification models by adopting a voting approach to form a classification ensemble. The voting approach in our method involves both classification error rate and feature selection rate to calculate the score of the each classifier in the ensemble. In our method, the feature selection rate depends on the extracting order of the feature subsets. In the experiment, we applied our method to arrhythmia dataset and generated three top disjointed feature sets. We then built three classifiers based on the top-three feature subsets and formed the classifier ensemble by using the voting approach. Our method can improve the classification accuracy in high dimensional dataset. The performance of each classifier and the performance of their ensemble were higher than the performance of the classifier that was based on whole feature space of the dataset. The classification performance was improved and a more stable classification model could be constructed with the proposed approach.

show abstract

“…Wrapper method implants the model hypothesis search within the feature subset search [8]. Once a number of subsets of feature are obtained, each subset is to be evaluated with the classifier [33]. It has more risk of overfitting and is more computationally exhaustive.…”

Section: Feature Selection Methods For Gene Expression Datamentioning

confidence: 99%

Predictive Analysis of Lung Cancer Recurrence

Srivastava

Rathi

Gupta

2011

Advances in Computing and Communications

View full text Add to dashboard Cite

Abstract. The paper is about the predictive analysis of lung cancer recurrence based on non-small cell lung cancer carcinoma gene expression data using data mining and machine learning techniques. Prediction is one of the most significant factors in statistical analysis. Predictive analysis is a term describing a variety of statistical and analytical techniques used to develop models that predict future events or behaviours. Prediction of cancer recurrence has been a challenging problem for many researchers. The proposed method involves four phases: data collection, gene selection, designing classifier model, statistical parameter calculation and finally the comparison with previous results. The major part of the method is the gene selection and classification. A hybrid method for gene selection and classification is used for statistical analysis of lung cancer recurrence. The most suitable techniques are used for this work on the basis of comparative analysis of different classification method and optimization techniques.

show abstract

Feature Subset Selection Problem using Wrapper Approach in Supervised Learning

Cited by 121 publications

References 10 publications

Modelling Predictors of Molecular Response to Frontline Imatinib for Patients with Chronic Myeloid Leukaemia

Modelling Predictors of Molecular Response to Frontline Imatinib for Patients with Chronic Myeloid Leukaemia

A Feature Selection-based Ensemble Method for Arrhythmia Classification

Predictive Analysis of Lung Cancer Recurrence

Contact Info

Product

Resources

About