Understanding Membership Inferences on Well-Generalized Learning Models

Long, Yunhui; Bindschaedler, Vincent; Wang, Lei; Bu, Diyue; Wang, Xiaofeng; Tang, Haixu; Gunter, Carl A.; Chen, Kai

doi:10.48550/arxiv.1802.04889

Cited by 62 publications

(75 citation statements)

References 32 publications

(44 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We argue that a simple modification to the score, which we call difficulty calibration, can drastically improve the attack's reliability. This approach has been applied to the loss score for high-precision attack against well-generalized models (Long et al, 2018). Let s(h, (x, y)) be the membership score, where higher score indicates a stronger signal that the sample (x, y) is a member.…”

Section: Difficulty Calibrationmentioning

confidence: 99%

“…Various forms of difficulty calibration have been considered in the context of highprecision attacks. Long et al (2018) selected samples that differ the most in loss between the target and a set of reference models, and showed that the resulting attack has high precision even for well-generalized target models. Carlini et al (2020) showed that such privacy attacks are also possible on large-scale language models such as GPT-2 (Radford et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On the Importance of Difficulty Calibration in Membership Inference Attacks

Watson¹,

Chen²,

Cormode³

et al. 2021

Preprint

View full text Add to dashboard Cite

The vulnerability of machine learning models to membership inference attacks has received much attention in recent years. However, existing attacks mostly remain impractical due to having high false positive rates, where non-member samples are often erroneously predicted as members. This type of error makes the predicted membership signal unreliable, especially since most samples are non-members in real world applications. In this work, we argue that membership inference attacks can benefit drastically from difficulty calibration, where an attack's predicted membership score is adjusted to the difficulty of correctly classifying the target sample. We show that difficulty calibration can significantly reduce the false positive rate of a variety of existing attacks without a loss in accuracy.

show abstract

Section: Difficulty Calibrationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

On the Importance of Difficulty Calibration in Membership Inference Attacks

Watson¹,

Chen²,

Cormode³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…A substantial body of literature followed this work extending the attacks to different setting such as white box analysis [Nasr et al, 2019, Sablayrolles et al, 2019, Leino and Fredrikson, 2020, label-only access [Li andZhang, 2020, Choquette-Choo et al, 2021], federated learning [Nasr et al, 2019, transfer learning [Zou et al, 2020] and different types of data, models such as aggregate location data [Pyrgelis et al, 2017], generative models [Hayes et al, 2019], language models [Carlini et al, 2019, Carlini et al, 2020, sentence embeddings [Song and Raghunathan, 2020], and speech recognition models [Shah et al, 2021]. Multiple works have looked at improving the attack methodology through a more fine grained analysis or by reducing the background knowledge and the compute power required to execute the attack [Long et al, 2018, Song and Mittal, 2021, Salem et al, 2018. All these works follow the same attack framework for membership inference, but they either exploit a slightly different signal that is correlated with membership of a point in the training set or find an efficient way to exploit the already known signals.…”

Section: Related Workmentioning

confidence: 99%

“…4.1 Attack S: MIA via Shadow modelsStarting fromShokri et al [2017], a substantial body of literatureNasr et al [2019],Sablayrolles et al [2019],Leino and Fredrikson [2020],Long et al [2018],Song and Mittal [2021],Salem et al …”

mentioning

confidence: 99%

Enhanced Membership Inference Attacks against Machine Learning Models

Ye¹,

Maddi²,

Murakonda³

et al. 2021

Preprint

View full text Add to dashboard Cite

How much does a given trained model leak about each individual data record in its training set? Membership inference attacks are used as an auditing tool to quantify the private information that a model leaks about the individual data points in its training set. The attacks are influenced by different uncertainties that an attacker has to resolve about training data, the training algorithm, and the underlying data distribution. Thus attack success rates, of many attacks in the literature, do not precisely capture the information leakage of models about their data, as they also reflect other uncertainties that the attack algorithm has. In this paper, we explain the implicit assumptions and also the simplifications made in prior work using the framework of hypothesis testing. We also derive new attack algorithms from the framework that can achieve a high AUC score while also highlighting the different factors that affect their performance. Our algorithms capture a very precise approximation of privacy loss in models, and can be used as a tool to perform an accurate and informed estimation of privacy risk in machine learning models. We provide a thorough empirical evaluation of our attack strategies on various machine learning tasks and benchmark datasets.

show abstract

“…They formalize the attack as a binary classification task and utilize a neural network (NN) model along with shadow training technique to distinguish members of training set from non-members. Following this work, many MIAs have been proposed which can be divided into two categories: direct attacks [32,33,46,47,54,55], which directly query the target sample and typically utilize only a single query; indirect attacks [8,28,29], which query for samples that are in the neighborhood of the target sample to infer membership, and typically utilize multiple queries. The research community has further extended the MIA to federated settings [31,33] and generative models [14].…”

Section: Introductionmentioning

confidence: 99%

Mitigating Membership Inference Attacks by Self-Distillation Through a Novel Ensemble Architecture

Tang,

Mahloujifar,

Song

et al. 2021

Preprint

View full text Add to dashboard Cite

Membership inference attacks are a key measure to evaluate privacy leakage in machine learning (ML) models. These attacks aim to distinguish training members from non-members by exploiting differential behavior of the models on member and non-member inputs. The goal of this work is to train ML models that have high membership privacy while largely preserving their utility; we therefore aim for an empirical membership privacy guarantee as opposed to the provable privacy guarantees provided by techniques like differential privacy, as such techniques are shown to deteriorate model utility. Specifically, we propose a new framework to train privacypreserving models that induces similar behavior on member and non-member inputs to mitigate membership inference attacks. Our framework, called SELENA, has two major components. The first component and the core of our defense is a novel ensemble architecture for training. This architecture, which we call Split-AI, splits the training data into random subsets, and trains a model on each subset of the data. We use an adaptive inference strategy at test time: our ensemble architecture aggregates the outputs of only those models that did not contain the input sample in their training data. We prove that our Split-AI architecture defends against a large family of membership inference attacks, however, it is susceptible to new adaptive attacks. Therefore, we use a second component in our framework called Self-Distillation to protect against such stronger attacks. The Self-Distillation component (self-)distills the training dataset through our Split-AI ensemble, without using any external public datasets. Through extensive experiments on major benchmark datasets we show that SELENA presents a superior trade-off between membership privacy and utility compared to the state of the art. In particular, SELENA incurs no more than 3.9% drop in classification accuracy compared to the undefended model. Compared to two state-of-the-art empirical defenses to membership privacy, MemGuard and adversarial regularization, SELENA reduces the membership inference attack advantage over a random guess by a factor of up to 3.7 compared to MemGuard and a factor of up to 2.1 compared to adversarial regularization.

show abstract

Understanding Membership Inferences on Well-Generalized Learning Models

Cited by 62 publications

References 32 publications

On the Importance of Difficulty Calibration in Membership Inference Attacks

On the Importance of Difficulty Calibration in Membership Inference Attacks

Enhanced Membership Inference Attacks against Machine Learning Models

Mitigating Membership Inference Attacks by Self-Distillation Through a Novel Ensemble Architecture

Contact Info

Product

Resources

About