Fast TreeSHAP: Accelerating SHAP Value Computation for Trees

Yang, Jilei

doi:10.48550/arxiv.2109.09847

Cited by 4 publications

(4 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We choose to utilize the Tree Explainer method instead of the kernel explainer due to its computational efficiency. The Tree Explainer leverages the tree-based structure of the XGB model to approximate the SHAP values, resulting in faster computation times while still providing reliable interpretations of feature importance [67].…”

Section: Feature Importance Modeling and Analysismentioning

confidence: 99%

XTS: A Hybrid Framework to Detect DNS-Over-HTTPS Tunnels Based on XGBoost and Cooperative Game Theory

et al. 2023

View full text Add to dashboard Cite

This paper proposes a hybrid approach called XTS that uses a combination of techniques to analyze highly imbalanced data with minimum features. XTS combines cost-sensitive XGBoost, a game theory-based model explainer called TreeSHAP, and a newly developed algorithm known as Sequential Forward Evaluation algorithm (SFE). The general aim of XTS is to reduce the number of features required to learn a particular dataset. It assumes that low-dimensional representation of data can improve computational efficiency and model interpretability whilst retaining a strong prediction performance. The efficiency of XTS was tested on a public dataset, and the results showed that by reducing the number of features from 33 to less than five, the proposed model achieved over 99.9% prediction efficiency. XTS was also found to outperform other benchmarked models and existing proof-of-concept solutions in the literature. The dataset contained data related to DNS-over-HTTPS (DoH) tunnels. The top predictors for DoH classification and characterization were identified using interactive SHAP plots, which included destination IP, packet length mode, and source IP. XTS offered a promising approach to improve the efficiency of the detection and analysis of DoH tunnels while maintaining accuracy, which can have important implications for behavioral network intrusion detection systems.

show abstract

Section: Feature Importance Modeling and Analysismentioning

confidence: 99%

XTS: A Hybrid Framework to Detect DNS-Over-HTTPS Tunnels Based on XGBoost and Cooperative Game Theory

et al. 2023

View full text Add to dashboard Cite

show abstract

“…To compute SHAP values for different types of machine learning models, various SHAP implementations are available. In this study, the SHAP Linear Explainer function was used for MLR predictors, while the FastTreeSHAP explainer (Yang, 2021) was used for other models. Compared to the widely used TreeSHAP algorithm, FastTreeSHAP provides faster computation of feature importance values for tree-based models.…”

Section: Multiple Models Interpretationmentioning

confidence: 99%

Development of Low–Cost Air Quality Stations for Next Generation Monitoring Networks: Calibration and Validation of NO2 and O3 Sensors

Cavaliere

Brilli

Andreini

et al. 2023

Preprint

View full text Add to dashboard Cite

Abstract. A Pre–deployment calibration and a field validation of two low-cost (LC) stations equipped with O3 and NO2 metal oxide sensors were addressed. Pre–deployment calibration was performed after developing and implementing a comprehensive calibration framework including several supervised learning models, such as univariate linear and non–linear algorithms, as well as multiple linear and non–linear algorithms. Univariate linear models included linear and robust regression, while univariate non–linear models included support vector machine, random forest, and gradient boosting. Multiple models consisted of both parametric and non-parametric algorithms. Internal temperature, relative humidity and gaseous interference compounds proved to be the most suitable predictors for multiple models, as they helped effectively mitigate the impact of environmental conditions and pollutant cross-sensitivity on sensor accuracy. A feature analysis, implementing Dominance analysis, feature permutations and, SHapley Additive exPlanations method, was also performed to provide further insight into the role played by each individual predictor and its impact on sensor performances. This study demonstrated that while multiple random forest (MRF) returned higher accuracy than multiple linear regression (MLR), it did not accurately represent physical models beyond the Pre–deployment calibration dataset, so that a linear approach may overall be a more suitable solution. Furthermore, as well as being less computationally demanding and generally more suitable for non-experts, parametric models such as MLR have a defined equation that also includes a few parameters, which allows easy adjustments for possible changes over time. Thus, drift correction or periodic automatable recalibration operations can be easily scheduled, which is particularly relevant for NO2 and O3 metal oxide sensors: as demonstrated in this study, they performed well with the same linear model form, but required unique parameter values due to inter-sensor variability.

show abstract

“…A notable approach in this direction is the path-dependent TreeSHAP algorithm (Lundberg et al 2020;Yang 2021), which is widely used due to its computational efficiency. It aims to approximate observational SHAP values of tree models by using precomputed node counts, but it implicitly assumes feature independence.…”

Section: Introductionmentioning

confidence: 99%

Interventional SHAP Values and Interaction Values for Piecewise Linear Regression Trees

Zern¹,

Broelemann²,

Kasneci

2023

AAAI

View full text Add to dashboard Cite

In recent years, game-theoretic Shapley values have gained increasing attention with respect to local model explanation by feature attributions. While the approach using Shapley values is model-independent, their (exact) computation is usually intractable, so efficient model-specific algorithms have been devised including approaches for decision trees or their ensembles in general. Our work goes further in this direction by extending the interventional TreeSHAP algorithm to piecewise linear regression trees, which gained more attention in the past few years. To this end, we introduce a decomposition of the contribution function based on decision paths, which allows a more comprehensible formulation of SHAP algorithms for tree-based models. Our algorithm can also be readily applied to computing SHAP interaction values of these models. In particular, as the main contribution of this paper, we provide a more efficient approach of interventional SHAP for tree-based models by precomputing statistics of the background data based on the tree structure.

show abstract

Fast TreeSHAP: Accelerating SHAP Value Computation for Trees

Cited by 4 publications

References 20 publications

XTS: A Hybrid Framework to Detect DNS-Over-HTTPS Tunnels Based on XGBoost and Cooperative Game Theory

XTS: A Hybrid Framework to Detect DNS-Over-HTTPS Tunnels Based on XGBoost and Cooperative Game Theory

Development of Low–Cost Air Quality Stations for Next Generation Monitoring Networks: Calibration and Validation of NO2 and O3 Sensors

Interventional SHAP Values and Interaction Values for Piecewise Linear Regression Trees

Contact Info

Product

Resources

About