2020
DOI: 10.48550/arxiv.2009.01884
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Model extraction from counterfactual explanations

Ulrich Aïvodji,
Alexandre Bolot,
Sébastien Gambs

Abstract: Post-hoc explanation techniques refer to a posteriori methods that can be used to explain how black-box machine learning models produce their outcomes. Among post-hoc explanation techniques, counterfactual explanations are becoming one of the most popular methods to achieve this objective. In particular, in addition to highlighting the most important features used by the black-box model, they provide users with actionable explanations in the form of data instances that would have received a different outcome. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
25
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(26 citation statements)
references
References 65 publications
1
25
0
Order By: Relevance
“…We explain this attack in detail in Section 3. [8,2]. Milli et al proposed an attack that used a loss regarding the distance between a gradient-based explanation of a victim model and that of a clone model and experimentally showed that the loss improved the efficiency of the attack [8].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…We explain this attack in detail in Section 3. [8,2]. Milli et al proposed an attack that used a loss regarding the distance between a gradient-based explanation of a victim model and that of a clone model and experimentally showed that the loss improved the efficiency of the attack [8].…”
Section: Related Workmentioning
confidence: 99%
“…Milli et al proposed an attack that used a loss regarding the distance between a gradient-based explanation of a victim model and that of a clone model and experimentally showed that the loss improved the efficiency of the attack [8]. Ulrich et al proposed an attack that used counterfactual explanations of the targeted model for training the clone model efficiently [2].…”
Section: Related Workmentioning
confidence: 99%
“…Membership inference attacks can be improved by concatenating gradient explanations with model predictions [38]. Model extraction attacks can be improved by regularizing the reconstructed model with gradient explanations [29], and training a surrogate model from counterfactual explanation examples [2]. However, exploiting explanations for model inversion attacks remains unexplored; this conceals the privacy risk on user data at prediction time, i.e., of active users.…”
Section: Related Workmentioning
confidence: 99%
“…where x a and x b are two images being compared, µ * and σ * represents the pixel value mean and standard deviation, respectively, C µ = (K µ L) 2 and C σ = (K σ L) 2 are constants to control instability, L is the dynamic range of the pixel values (255 for 8-bit grayscale images), and K µ = 0.01 and K σ = 0.03 are chosen to be small. To compare images at different levels of granularity, we compare Gaussian kernels for both images at specified standard deviations, σ (smaller σ for more precise comparison), and compute their mean.…”
Section: Evaluation Metricsmentioning
confidence: 99%
See 1 more Smart Citation