Determination of an optimal feature selection method based on maximum Shapley value

Mokdad, Fatiha; Bouchaffra, Djamel; Zerrouki, Nabil; Touazi, Azzedine

doi:10.1109/isda.2015.7489211

Cited by 10 publications

(8 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The forward elimination method achieved the highest accuracy among the experimental comparison algorithms. In 2016, Mokdad et al designed a feature selection algorithm structure derived from the Shapley value [44]. First, the rank of N groups of features was obtained by N feature selection algorithms, and then the Borda Coun method was adopted to determine the ultimate feature rank.…”

Section: Related Workmentioning

confidence: 99%

A Multi-Objective Multi-Label Feature Selection Algorithm Based on Shapley Value

Dong

Sun

2021

Entropy

View full text Add to dashboard Cite

Multi-label learning is dedicated to learning functions so that each sample is labeled with a true label set. With the increase of data knowledge, the feature dimensionality is increasing. However, high-dimensional information may contain noisy data, making the process of multi-label learning difficult. Feature selection is a technical approach that can effectively reduce the data dimension. In the study of feature selection, the multi-objective optimization algorithm has shown an excellent global optimization performance. The Pareto relationship can handle contradictory objectives in the multi-objective problem well. Therefore, a Shapley value-fused feature selection algorithm for multi-label learning (SHAPFS-ML) is proposed. The method takes multi-label criteria as the optimization objectives and the proposed crossover and mutation operators based on Shapley value are conducive to identifying relevant, redundant and irrelevant features. The comparison of experimental results on real-world datasets reveals that SHAPFS-ML is an effective feature selection method for multi-label classification, which can reduce the classification algorithm’s computational complexity and improve the classification accuracy.

show abstract

Section: Related Workmentioning

confidence: 99%

A Multi-Objective Multi-Label Feature Selection Algorithm Based on Shapley Value

Dong

Sun

2021

Entropy

View full text Add to dashboard Cite

show abstract

“…Yang et al [27] used SHAP to provide an extensive analysis of the relationship between EEG features to different annotation techniques for affect recognition. Shapley values can also be used in place of feature selection metrics [28] to determine the importance of individual features and data samples. So far, few studies compare cross-dataset findings using highlevel explanations.…”

Section: Explainable Affect Recognitionmentioning

confidence: 99%

Emotion Recognition Using Explainable Genetically Optimized Fuzzy ART Ensembles

2021

View full text Add to dashboard Cite

There is a growing demand for explainability in complex artificial intelligence solutions to support critical applications' decision-making processes. Barriers to explainable processes include blackbox classifiers, such as deep learning, and noisy datasets. Affect recognition involving neural networks attempts to map complex human emotions onto Arousal and Valence scales based on physiological signal measurements. Datasets collected for this purpose are inherently noisy and may contain outliers and imbalanced classes, hindering accurate classification. In our approach, these issues are addressed using Fuzzy ART (FA) for clustering data samples into more condensed memory templates, introducing stochastic resonant noise to amplify signal-to-noise ratio, and SMOTE sampling to generate synthetic minority samples. A genetic algorithm is developed for FA optimization and ensemble model selection. Clusters obtained from the resulting ensembles are then used to train an ensemble of boosted decision trees for classification and to visualize the decision-making processes. Individual features such as heart rate variability and EEG band power, as well as feature interactions between pairs of features, may contain critical information as human affect indicators. Contributions of individual features and feature interactions toward describing human affect are quantified and interpreted using Shapley additive explanation values. Three established affect recognition datasets were considered for mapping physiological features onto a binary classification of Low/High Arousal and Positive/Negative Valence. Our framework was able to achieve good generalization for both classification tasks as well as provide detailed insights into the contributions of physiological features towards describing Arousal and Valence affects.

show abstract

“…They presented an approach that could directly provide the ranking order of input and output variables separately. Mokdad et al (2016) proposed an optimal feature selection method based on the maximum Shapley value, and validated it by conducting experiments for a classification task based on an SVM classifier. Hur et al (2017) applied the Shapley value with random forest to analyze the influence of variables and list the priority of variables that affected classification accuracy.…”

Section: Measuring Variable Impacts Using the Shapley Valuementioning

confidence: 99%

Identifying impact of variables in deep learning models on bankruptcy prediction of construction contractors

Jang

Jeong

Cho

2021

ECAM

View full text Add to dashboard Cite

PurposeThe study seeks to identify the impact of variables in a deep learning-based bankruptcy prediction model, which has achieved superior performance to other prediction models but cannot easily interpret hidden processes.Design/methodology/approachThis study developed three LSTM-RNN–based models that predicted the probability of bankruptcy before 1, 2 and 3 years using financial, the construction market and macroeconomic variables as input variables. Then, the impacts of the input variables that affected prediction accuracy in each model were identified by using Shapley value and compared among the three models. This study also investigated the prediction accuracy using variants of input variables grouped sequentially by high-impact ranking.FindingsThe results showed that the prediction accuracies were largely impacted by “housing starts” in all models. As the prediction period increased, the effects of macroeconomic variables on prediction accuracy increased, whereas the impact of “return on assets” on prediction accuracy decreased. It also found that the “current ratio” and “debt ratio” significantly influenced the prediction accuracies in all models. Also, the results revealed that similar prediction accuracies could be achieved using only 8, 10, and 10 variables out of a total of 18 variables for the 1-, 2-, and 3-year prediction models, respectively.Originality/valueThis study provides a Shapley value-based approach to identify how each input variable in a deep-learning bankruptcy prediction model. The findings of this study can not only assist in obtaining better insights into the underlying concept of bankruptcy but also use to select variables by removing those identified as less significant.

show abstract

Determination of an optimal feature selection method based on maximum Shapley value

Cited by 10 publications

References 13 publications

A Multi-Objective Multi-Label Feature Selection Algorithm Based on Shapley Value

A Multi-Objective Multi-Label Feature Selection Algorithm Based on Shapley Value

Emotion Recognition Using Explainable Genetically Optimized Fuzzy ART Ensembles

Identifying impact of variables in deep learning models on bankruptcy prediction of construction contractors

Contact Info

Product

Resources

About