Model Explainability using SHAP Values for LightGBM Predictions

Bugaj, Michal; Wróbel, Krzysztof; Iwaniec, Joanna

doi:10.1109/memstech53091.2021.9468078

Cited by 10 publications

(3 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To facilitate the application of the prediction models, we conducted feature reduction by illustrating the changes in the prediction accuracy of the models with different numbers of input features (ie, those with top 10, 20, 50, and 100 SHAP values) [ 34 , 35 ]. As shown in Figure S3 of Multimedia Appendix 1 , the models for predicting suicidal behaviors within 1 year and from 1 to 6 years both achieved overall good performance when the input feature dimension with the highest SHAP value was increased to 20, so we considered the models with 20 input features as the applicable prediction models which might facilitate the future implication.…”

Section: Methodsmentioning

confidence: 99%

Prediction of Suicidal Behaviors in the Middle-aged Population: Machine Learning Analyses of UK Biobank

Wang¹,

Qiu²,

Zhu³

et al. 2023

JMIR Public Health Surveill

View full text Add to dashboard Cite

Background Suicidal behaviors, including suicide deaths and attempts, are major public health concerns. However, previous suicide models required a huge amount of input features, resulting in limited applicability in clinical practice. Objective We aimed to construct applicable models (ie, with limited features) for short- and long-term suicidal behavior prediction. We further validated these models among individuals with different genetic risks of suicide. Methods Based on the prospective cohort of UK Biobank, we included 223 (0.06%) eligible cases of suicide attempts or deaths, according to hospital inpatient or death register data within 1 year from baseline and randomly selected 4460 (1.18%) controls (1:20) without such records. We similarly identified 833 (0.22%) cases of suicidal behaviors 1 to 6 years from baseline and 16,660 (4.42%) corresponding controls. Based on 143 input features, mainly including sociodemographic, environmental, and psychosocial factors; medical history; and polygenic risk scores (PRS) for suicidality, we applied a bagged balanced light gradient-boosting machine (LightGBM) with stratified 10-fold cross-validation and grid-search to construct the full prediction models for suicide attempts or deaths within 1 year or between 1 and 6 years. The Shapley Additive Explanations (SHAP) approach was used to quantify the importance of input features, and the top 20 features with the highest SHAP values were selected to train the applicable models. The external validity of the established models was assessed among 50,310 individuals who participated in UK Biobank repeated assessments both overall and by the level of PRS for suicidality. Results Individuals with suicidal behaviors were on average 56 years old, with equal sex distribution. The application of these full models in the external validation data set demonstrated good model performance, with the area under the receiver operating characteristic (AUROC) curves of 0.919 and 0.892 within 1 year and between 1 and 6 years, respectively. Importantly, the applicable models with the top 20 most important features showed comparable external-validated performance (AUROC curves of 0.901 and 0.885) as the full models, based on which we found that individuals in the top quintile of predicted risk accounted for 91.7% (n=11) and 80.7% (n=25) of all suicidality cases within 1 year and during 1 to 6 years, respectively. We further obtained comparable prediction accuracy when applying these models to subpopulations with different genetic susceptibilities to suicidality. For example, for the 1-year risk prediction, the AUROC curves were 0.907 and 0.885 for the high (>2nd tertile of PRS) and low (<1st) genetic susceptibilities groups, respectively. Conclusions We established applicable machine learning–based models for predicting both the short- and long-term risk of suicidality with high accuracy across populations of varying genetic risk for suicide, highlighting a cost-effective method of identifying individuals with a high risk of suicidality.

show abstract

Section: Methodsmentioning

confidence: 99%

Prediction of Suicidal Behaviors in the Middle-aged Population: Machine Learning Analyses of UK Biobank

Wang¹,

Qiu²,

Zhu³

et al. 2023

JMIR Public Health Surveill

View full text Add to dashboard Cite

show abstract

“…SHAP is a model additive explanation approach from cooperative game theory. The method presents and explains the prediction concerning the contribution of each feature to the predicted value (Bugaj et al, 2021). Being a model-agnostic methodology, SHAP can explain individual predictions without being limited to a specific machine-learning model.…”

Section: Shap Valuesmentioning

confidence: 99%

Interpreting direct sales’ demand forecasts using SHAP values

Arboleda-Florez

Zuluaga

2023

Prod.

View full text Add to dashboard Cite

Paper aims: Several concerns regarding the lack of interpretability of machine learning models obstruct the implementation of machine learning projects as part of the demand forecasting process. This paper presents a methodology to support the introduction of machine learning into the forecasting process of a traditional direct sales company by providing explanations for the otherwise obscure results. We also suggest incorporating human knowledge inside the machine learning pipeline as an essential part of capturing the business logic and integrating machine learning into the existing processes.Originality: Using explainable machine learning methods on real-life company data demonstrates that machine learning techniques are functional beyond the academy and can be introduced to everyday companies' production. Research method:The project used real-world data from a company and followed a traditional machine learning pipeline to collect, preprocess, select and train a machine learning model, to conclude with the explanation of the model results through the implementation of SHAP Main findings: The results provided insights regarding the contribution of the features to the forecast. We analyzed individual predictions to understand the behavior of different variables, proving helpful when interpreting complex machine learning models. Implications for theory and practice:This study contributes to a discussion about adopting new technology and implementing machine learning models for demand forecasting. The methodology presented in this paper can be used to implement similar projects on interested companies.

show abstract

“…To achieve this, the use of LightGBM (Light Gradient Boosted Machine) data analysis technique for bank marketing can be a solution to enhance the effectiveness of marketing campaign strategies. LightGBM is a gradient boosting framework based on decision trees developed by Microsoft in 2017 [2]. In terms of CPU execution time and accuracy, LightGBM outperforms other gradient boosting methods significantly [3], [4].…”

Section: Introductionmentioning

confidence: 99%

Optimizing Bank Marketing Strategies Through Analysis Using Lightgbm

Diaz Aditya,

Yudha Satria

2023

coreid

View full text Add to dashboard Cite

Marketing campaigns in a bank are one of the ways for the bank to achieve its organizational goals. Optimal marketing is a crucial factor for a bank's success in attracting and retaining customers. Therefore, if a bank's marketing campaigns are carried out suboptimally, it will be challenging to achieve the goals of those campaigns. In this case study, it can be observed that the number of customers who subscribe to fixed-term deposits is lower, with a proportion of 5289 customers making deposits and 5873 customers not making deposits. This research aims to optimize bank marketing strategies by applying analysis using the LightGBM algorithm, which is a highly effective and efficient Gradient Boosting Decision Tree algorithm. This approach facilitates the design of more optimal marketing strategies. The accuracy score of the predictive model generated is 0.8584, with an F1 score of 0.8564, including 974 true negatives and 943 true positives.

show abstract

Model Explainability using SHAP Values for LightGBM Predictions

Cited by 10 publications

References 8 publications

Prediction of Suicidal Behaviors in the Middle-aged Population: Machine Learning Analyses of UK Biobank

Prediction of Suicidal Behaviors in the Middle-aged Population: Machine Learning Analyses of UK Biobank

Interpreting direct sales’ demand forecasts using SHAP values

Optimizing Bank Marketing Strategies Through Analysis Using Lightgbm

Contact Info

Product

Resources

About