A Feature Selection Based on Relevance and Redundancy

Introduction: Obstructive sleep apnea syndrome has become an important public health concern. Polysomnography is traditionally considered an established and effective diagnostic tool providing information on the severity of obstructive sleep apnea syndrome and the degree of sleep fragmentation. However, the numerous steps in the polysomnography test to diagnose obstructive sleep apnea syndrome are costly and time consuming. This study aimed to test the efficacy and clinical applicability of different machine learning methods based on demographic information and questionnaire data to predict obstructive sleep apnea syndrome severity. Materials and methods: We collected data about demographic characteristics, spirometry values, gas exchange (PaO2, PaCO2) and symptoms (Epworth Sleepiness Scale, snoring, etc.) of 313 patients with previous diagnosis of obstructive sleep apnea syndrome. After principal component analysis, we selected 19 variables which were used for further preprocessing and to eventually train seven types of classification models and five types of regression models to evaluate the prediction ability of obstructive sleep apnea syndrome severity, represented either by class or by apnea–hypopnea index. All models are trained with an increasing number of features and the results are validated through stratified 10-fold cross validation. Results: Comparative results show the superiority of support vector machine and random forest models for classification, while support vector machine and linear regression are better suited to predict apnea–hypopnea index. Also, a limited number of features are enough to achieve the maximum predictive accuracy. The best average classification accuracy on test sets is 44.7 percent, with the same average sensitivity (recall). In only 5.7 percent of cases, a severe obstructive sleep apnea syndrome (class 4) is misclassified as mild (class 2). Regression results show a minimum achieved root mean squared error of 22.17. Conclusion: The problem of predicting apnea–hypopnea index or severity classes for obstructive sleep apnea syndrome is very difficult when using only data collected prior to polysomnography test. The results achieved with the available data suggest the use of machine learning methods as tools for providing patients with a priority level for polysomnography test, but they still cannot be used for automated diagnosis.

show abstract

“…FCBF 19 (fast correlation-based filter)-an entropy-based measure, which also identifies redundancy due to pairwise correlations between features.…”

Section: Feature Ranking and Selectionmentioning

confidence: 99%

Application of machine learning to predict obstructive sleep apnea syndrome severity

Mencar

Gallo

Mantero³

et al. 2019

Health Informatics J

View full text Add to dashboard Cite

show abstract

“…The first type of categorization that is commonly mentioned in this literature is regarding the features. Features can be divided into three main categories: Strongly relevant features, weakly relevant features, irrelevant features (Yu and Liu, 2004;Yu et al, 2021). Strongly relevant features are the essential features that should not be removed during a feature selection process.…”

Section: Categorization In Feature Selection Algorithmsmentioning

confidence: 99%

Machine learning and feature selection: Applications in economics and climate change

Akyapı

2023

Environ. Data Science

View full text Add to dashboard Cite

Feature selection is an important component of machine learning for researchers that are confronted with high dimensional data. In the field of economics, researchers are often faced with high dimensional data, particularly in the studies that aim to understand the channels through which climate change affects the welfare of countries. This work reviews the current literature that introduces various feature selection algorithms that may be useful for applications in this area of study. The article first outlines the specific problems that researchers face in understanding the effects of climate change on countries’ macroeconomic outcomes, and then provides a discussion regarding different categories of feature selection. Emphasis is placed on two main feature selection algorithms: Least Absolute Shrinkage and Selection Operator and causality-based feature selection. I demonstrate an application of feature selection to discover the optimal heatwave definition for economic outcomes, enhancing our understanding of extreme temperatures’ impact on the economy. I argue that the literature in computer science can provide useful insights in studies concerned with climate change as well as its economic outcomes.

show abstract

“…The Fast Correlation-Based Filter (FCBF) algorithm is one of the feature selection frameworks which makes use of the relevance and redundancy between features to determine a suitable feature subset with low feature redundancy for high-dimensional data set [23]. The processing flow of the FCBF algorithm for selecting feature set is shown in Fig.…”

Section: Principles a The Fcbc Frameworkmentioning

confidence: 99%

A Runway Detection Method Based on Classification Using Optimized Polarimetric Features and HOG Features for PolSAR Images

et al. 2020

View full text Add to dashboard Cite

A novel runway detection algorithm for PolSAR (Polarimetric Synthetic Aperture Radar) images based on optimized polarimetric features and local spatial information is proposed. Existing methods for runway detection for PolSAR images always utilize the parallel line as the primary feature. However, many other ground objects such as rivers and roads also have parallel structures thus affect the performance of these detection methods. The proposed method is based on two stages of classification with polarimetric features and the HOG (Histogram of Oriented Gradient) feature, while avoiding the interference due to the similar morphological features among different ground objects. An FCBF (Fast Correlation Based Filter) is firstly used for optimizing and selecting of the ground objects' polarimetric features of ground targets. Then RF (Random Forest) classifier is employed for extracting ROIs (Region of Interest) which may contain runways. Then HOG features are extracted from these ROIs for further classification with SVM (Support Vector Machines) to detect the runway area. Experimental results with the measured PolSAR data provided by NASA UAVSAR project show that the proposed method can detect runway regions effectively without using the parallel line. Comparative analysis is also conducted on parallel line pattern based algorithms. And the results suggest the effectiveness and performance enhancement of this method.

show abstract

A Feature Selection Based on Relevance and Redundancy

Cited by 10 publications

References 6 publications

Application of machine learning to predict obstructive sleep apnea syndrome severity

Application of machine learning to predict obstructive sleep apnea syndrome severity

Machine learning and feature selection: Applications in economics and climate change

A Runway Detection Method Based on Classification Using Optimized Polarimetric Features and HOG Features for PolSAR Images

Contact Info

Product

Resources

About