shapr: An R-package for explaining machine learning models with dependence-aware Shapley values

Sellereite, Nikolai; Jullum, Martin

doi:10.21105/joss.02027

Cited by 22 publications

(15 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On top of already published packages, such as shapper (March 2019) and fastshap (November 2019) there are new, recently created tools that await their publication on CRAN. shapr (Sellereite and Jullum, 2019), treeshap (2020), SHAPforxgboost (2020 are examples of such new packages. Not to mention about other Predict Parts Explanation Methods like iBreakDown.…”

Section: Discussionmentioning

confidence: 99%

Landscape of R packages for eXplainable Artificial Intelligence

Maksymiuk,

Gosiewska,

Biecek

2020

Preprint

View full text Add to dashboard Cite

The growing availability of data and computing power fuels the development of predictive models. In order to ensure the safe and effective functioning of such models, we need methods for exploration, debugging, and validation. New methods and tools for this purpose are being developed within the eXplainable Artificial Intelligence (XAI) subdomain of machine learning. In this work (1) we present the taxonomy of methods for model explanations, (2) we identify and compare 27 packages available in R to perform XAI analysis, (3) we present an example of an application of particular packages, (4) we acknowledge recent trends in XAI. The article is primarily devoted to the tools available in R, but since it is easy to integrate the Python code, we will also show examples for the most popular libraries from Python.

show abstract

Section: Discussionmentioning

confidence: 99%

Landscape of R packages for eXplainable Artificial Intelligence

Maksymiuk,

Gosiewska,

Biecek

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…For (2), it would likely be fruitful to explore alternative approaches based on Shapley additive explanations (SHAP; Lundberg & Lee, 2017). SHAP is a game theoretic method for explaining fitted classifiers' predictions and has several extensions that help prevent its performance from degrading in the presence of multicollinearity (Aas, Jullum, & Løland, 2021;Basu & Maji, 2020;Sellereite & Jullum, 2020). We note, however, that SHAP is less computationally efficient than PI, potentially hampering its application to very large-scale data.…”

Section: Discussionmentioning

confidence: 99%

Deep Learning-Based Estimation and Goodness-of-Fit for Large-Scale Confirmatory Item Factor Analysis

Urban

Bauer²

2021

Preprint

View full text Add to dashboard Cite

We investigate novel parameter estimation and goodness-of-fit (GOF) assessment methods for large-scale confirmatory item factor analysis (IFA) with many respondents, items, and latent factors. For parameter estimation, we extend Urban and Bauer's (2021) deep learning algorithm for exploratory IFA to the confirmatory setting by showing how to handle user-defined constraints on loadings and factor correlations. For GOF assessment, we explore new simulation-based tests and indices. In particular, we consider extensions of the classifier two-sample test (C2ST), a method that tests whether a machine learning classifier can distinguish between observed data and synthetic data sampled from a fitted IFA model. The C2ST provides a flexible framework that integrates overall model fit, piece-wise fit, and person fit. Proposed extensions include a C2ST-based test of approximate fit in which the user specifies what percentage of observed data can be distinguished from synthetic data as well as a C2ST-based relative fit index that is similar in spirit to the relative fit indices used in structural equation modeling. Via simulation studies, we first show that the confirmatory extension of Urban and Bauer's (2021) algorithm produces more accurate parameter estimates as the sample size increases and obtains comparable estimates to a state-of-the-art confirmatory IFA estimation procedure in less time. We next show that the C2ST-based test of approximate fit controls the empirical type I error rate and detects when the number of latent factors is misspecified. Finally, we empirically investigate how the sampling distribution of the C2ST-based relative fit index depends on the sample size.

show abstract

“…For the approaches presented in this paper, we have fitted both a non-parametric and a parametric vine. The independence, empirical, Gaussian and Gaussian copula approaches are all implemented in the R package shapr [30], and the plan is to also include the approaches proposed in this paper. The simulation model is detailed in Section 5.1, the actual design of the experiments is given in Section 5.2.…”

Section: Simulation Studiesmentioning

confidence: 99%

Explaining predictive models using Shapley values and non-parametric vine copulas

Aas¹,

Nägler²,

Jullum³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

The original development of Shapley values for prediction explanation relied on the assumption that the features being described were independent. If the features in reality are dependent this may lead to incorrect explanations. Hence, there have recently been attempts of appropriately modelling/estimating the dependence between the features. Although the proposed methods clearly outperform the traditional approach assuming independence, they have their weaknesses. In this paper we propose two new approaches for modelling the dependence between the features. Both approaches are based on vine copulas, which are flexible tools for modelling multivariate non-Gaussian distributions able to characterise a wide range of complex dependencies. The performance of the proposed methods is evaluated on simulated data sets and a real data set. The experiments demonstrate that the vine copula approaches give more accurate approximations to the true Shapley values than its competitors.

show abstract

shapr: An R-package for explaining machine learning models with dependence-aware Shapley values

Cited by 22 publications

References 7 publications

Landscape of R packages for eXplainable Artificial Intelligence

Landscape of R packages for eXplainable Artificial Intelligence

Deep Learning-Based Estimation and Goodness-of-Fit for Large-Scale Confirmatory Item Factor Analysis

Explaining predictive models using Shapley values and non-parametric vine copulas

Contact Info

Product

Resources

About