Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks

Mireshghallah, Fatemehsadat; Goyal, Kartik; Uniyal, Archit; Berg-Kirkpatrick, Taylor; Shokri, Reza

doi:10.48550/arxiv.2203.03929

Cited by 8 publications

(7 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Memorization in Language Models: Unintended memorization is a known challenge for language models [12,13], which makes them open to extraction attacks [14,15] and membership inference attacks [16,17], although there has been work on mitigating these vulnerabilities [11,18]. Recent work has argued that memorization is not exclusively harmful, and can be crucial for certain types of generalization (e.g., on QA tasks) [19,20,21], while also allowing the models to encode significant amounts of world or factual knowledge [22,23,24].…”

Section: Background and Related Workmentioning

confidence: 99%

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Tirumala¹,

Markosyan²,

Zettlemoyer³

et al. 2022

Preprint

View full text Add to dashboard Cite

Despite their wide adoption, the underlying training and memorization dynamics of very large language models is not well understood. We empirically study exact memorization in causal and masked language modeling, across model sizes and throughout the training process. We measure the effects of dataset size, learning rate, and model size on memorization, finding that larger language models memorize training data faster across all settings. Surprisingly, we show that larger models can memorize a larger portion of the data before over-fitting and tend to forget less throughout the training process. We also analyze the memorization dynamics of different parts of speech and find that models memorize nouns and numbers first; we hypothesize and provide empirical evidence that nouns and numbers act as a unique identifier for memorizing individual training examples. Together, these findings present another piece of the broader puzzle of trying to understand what actually improves as models get bigger.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Tirumala¹,

Markosyan²,

Zettlemoyer³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Membership Inference Attacks in NLP Specifically in NLP, membership inference attacks are an important component of language model extraction attacks (Carlini et al, 2021b;Mireshghallah et al, 2022b). Further studies of interest include work by Hisamoto et al (2020), which studies membership inference attacks in machine translation, as well as work by Mireshghallah et al (2022a), which investigates Likelihood Ratio Attacks for masked language models. Specifically for language models, a large body of work also studies the related phenomenon of memorization (Kandpal et al, 2022;Carlini et al, 2022b,a;Zhang et al, 2021), which enables membership inference and data extraction attacks in the first place.…”

Section: Related Workmentioning

confidence: 99%

Membership Inference Attacks Against Self-supervised Speech Models

Tseng¹,

Kao²,

Lee³

2022

Interspeech 2022

View full text Add to dashboard Cite

Membership Inference attacks (MIAs) aim to predict whether a data sample was present in the training data of a machine learning model or not, and are widely used for assessing the privacy risks of language models. Most existing attacks rely on the observation that models tend to assign higher probabilities to their training samples than non-training points. However, simple thresholding of the model score in isolation tends to lead to high false-positive rates as it does not account for the intrinsic complexity of a sample. Recent work has demonstrated that reference-based attacks which compare model scores to those obtained from a reference model trained on similar data can substantially improve the performance of MIAs. However, in order to train reference models, attacks of this kind make the strong and arguably unrealistic assumption that an adversary has access to samples closely resembling the original training data. Therefore, we investigate their performance in more realistic scenarios and find that they are highly fragile in relation to the data distribution used to train reference models. To investigate whether this fragility provides a layer of safety, we propose and evaluate neighbourhood attacks, which compare model scores for a given sample to scores of synthetically generated neighbour texts and therefore eliminate the need for access to the training data distribution. We show that, in addition to being competitive with reference-based attacks that have perfect knowledge about the training data distribution, our attack clearly outperforms existing reference-free attacks as well as referencebased attacks with imperfect knowledge, which demonstrates the need for a reevaluation of the threat model of adversarial attacks.

show abstract

“…In contrast, the ability to poison a small fraction of the training set may be much more realistic (especially for very large models). Recent work [11,46,68,72] show that generic non-calibrated MI attacks (without poisoning) perform no better than chance at low false-positives (see Figure 5). With poisoning however, these non-calibrated attacks perform extremely well.…”

Section: Are Shadow Models Necessary?mentioning

confidence: 99%

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

Tramèr¹,

Shokri²,

Joaquin³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

We introduce a new class of attacks on machine learning models. We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak significant private details of training points belonging to other parties. Our active inference attacks connect two independent lines of work targeting the integrity and privacy of machine learning training data.Our attacks are effective across membership inference, attribute inference, and data extraction. For example, our targeted attacks can poison <0.1% of the training dataset to boost the performance of inference attacks by 1 to 2 orders of magnitude. Further, an adversary who controls a significant fraction of the training data (e.g., 50%) can launch untargeted attacks that enable 8× more precise inference on all other users' otherwise-private data points.Our results cast doubts on the relevance of cryptographic privacy guarantees in multiparty computation protocols for machine learning, if parties can arbitrarily select their share of training data.

show abstract

Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks

Cited by 8 publications

References 20 publications

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Membership Inference Attacks Against Self-supervised Speech Models

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

Contact Info

Product

Resources

About