Proceedings of the Conference on Fairness, Accountability, and Transparency 2019
DOI: 10.1145/3287560.3287562
|View full text |Cite
|
Sign up to set email alerts
|

Model Reconstruction from Model Explanations

Abstract: We show through theory and experiment that gradient-based explanations of a model quickly reveal the model itself. Our results speak to a tension between the desire to keep a proprietary model secret and the ability to offer model explanations.On the theoretical side, we give an algorithm that provably learns a two-layer ReLU network in a setting where the algorithm may query the gradient of the model with respect to chosen inputs. The number of queries is independent of the dimension and nearly optimal in its… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
108
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 116 publications
(108 citation statements)
references
References 15 publications
0
108
0
Order By: Relevance
“…Three organizations mentioned data privacy in the context of explainability, since in some cases explanations can be used to learn about the model [38,63] or the training data [56]. Methods to counter these concerns have been developed.…”
Section: On Privacymentioning
confidence: 99%
“…Three organizations mentioned data privacy in the context of explainability, since in some cases explanations can be used to learn about the model [38,63] or the training data [56]. Methods to counter these concerns have been developed.…”
Section: On Privacymentioning
confidence: 99%
“…However, this method requires attackers to know the learning algorithm, the training data, etc. Milli et al [67] present an algorithm to learn a model through querying the gradient information of the target model for specific inputs. It is shown that gradient information can quickly reveal the model parameters.…”
Section: Model Extraction Attackmentioning
confidence: 99%
“…It is shown that gradient information can quickly reveal the model parameters. They conclude that gradient is a more efficient learning primitive than the predicted label [67]. However, this heuristic method introduces high computational overhead and they only evaluate their model extraction attacks on a two-layer neural network.…”
Section: Model Extraction Attackmentioning
confidence: 99%
“…Similarly, network activations may be larger than the available on-chip memory and may be stored in RAM. These activations also need to be encrypted, because even in cases when the device manufacturer is not concerned about privacy, these activations can be used in order to infer the model weights [56].…”
Section: Attacks On Deployed Neural Networkmentioning
confidence: 99%