Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models

Fowl, Liam; Geiping, Jonas; Reich, Steven D.; Wen, Yuxin; Czaja, Wojciech; Goldblum, Micah; Goldstein, Tom

doi:10.48550/arxiv.2201.12675

Cited by 3 publications

(3 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, an adversary that controls part of the training code can use the trained model as a side-channel to exfiltrate training data [3,61]. Or in federated learning, a malicious server can select model architectures that enable reconstructing training samples [9,20]. Alternatively, participants in decentralized learning protocols can boost privacy attacks by sending dynamic malicious updates [44,51,69].…”

Section: Attacks On Training Integritymentioning

confidence: 99%

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

Tramèr¹,

Shokri²,

Joaquin³

et al. 2022

Preprint

View full text Add to dashboard Cite

We introduce a new class of attacks on machine learning models. We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak significant private details of training points belonging to other parties. Our active inference attacks connect two independent lines of work targeting the integrity and privacy of machine learning training data.Our attacks are effective across membership inference, attribute inference, and data extraction. For example, our targeted attacks can poison <0.1% of the training dataset to boost the performance of inference attacks by 1 to 2 orders of magnitude. Further, an adversary who controls a significant fraction of the training data (e.g., 50%) can launch untargeted attacks that enable 8× more precise inference on all other users' otherwise-private data points.Our results cast doubts on the relevance of cryptographic privacy guarantees in multiparty computation protocols for machine learning, if parties can arbitrarily select their share of training data.

show abstract

Section: Attacks On Training Integritymentioning

confidence: 99%

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

Tramèr¹,

Shokri²,

Joaquin³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…However, this comes at the cost of increased computational or communication costs between the clients and the server, and with increasing concerns about compromised user privacy. Privacy of user data is a growing concern, and standard federated averaging techniques are vulnerable to data leakage by inverting gradients into the data that generated them [1]- [5]. Gradients can be encrypted to preserve privacy, but incur further communication overhead [6].…”

Section: Introductionmentioning

confidence: 99%

TOFU: Toward Obfuscated Federated Updates by Encoding Weight Updates Into Gradients From Proxy Data

Nagaraj,

Garg,

Roy

2024

IEEE Access

View full text Add to dashboard Cite

Advances in Federated Learning and an abundance of user data have enabled rich collaborative learning between multiple clients, without sharing user data. This is done via a central server that aggregates learning in the form of weight updates. However, this comes at the cost of repeated expensive communication between the clients and the server, and concerns about compromised user privacy. The inversion of gradients into the data that generated them is termed data leakage. Encryption techniques can be used to counter this leakage but at added expense. To address these challenges of communication efficiency and privacy, we propose TOFU, a novel algorithm that generates proxy data that encodes the weight updates for each client in its gradients. Instead of weight updates, this proxy data is now shared. Since input data is far lower in dimensional complexity than weights, this encoding allows us to send much lesser data per communication round. Additionally, the proxy data resembles noise and even perfect reconstruction from data leakage attacks would invert the decoded gradients into unrecognizable noise, enhancing privacy. We show that TOFU enables learning with less than 1% and 7% accuracy drops on MNIST and CIFAR-10 datasets, respectively. This drop can be recovered via a few rounds of expensive encrypted gradient exchange. This enables us to learn to near-full accuracy in a federated setup, while being 4× and 6.6× more communication efficient than the standard Federated Averaging algorithm on MNIST and CIFAR-10, respectively.

show abstract

“…This is particularly the case when participants are allowed to deviate from the predefined ML protocol (in a malicious adversary setting). When training a federated learning model, each potentially malicious participant can send false data on purpose [21] to prevent learning of the global model [22][23][24]. Furthermore, in an iterative procedure, any participant could compare the last global model with the previous state.…”

Section: Introductionmentioning

confidence: 99%

Encrypted machine learning of molecular quantum properties

Weinreich¹,

Rudorff²,

Lilienfeld³

2023

Mach. Learn.: Sci. Technol.

View full text Add to dashboard Cite

Large machine learning (ML) models with improved predictions have become widely available in the chemical sciences. Unfortunately, these models do not protect the privacy necessary within commercial settings, prohibiting the use of potentially extremely valuable data by others. Encrypting the prediction process can solve this problem by double-blind model evaluation and prohibits the extraction of training or query data. However, contemporary ML models based on fully homomorphic encryption or federated learning are either too expensive for practical use or have to trade higher speed for weaker security. We have implemented secure and computationally feasible encrypted ML models using oblivious transfer enabling and secure predictions of molecular quantum properties across chemical compound space. However, we find that encrypted predictions using kernel ridge regression models are a million times more expensive than without encryption. This demonstrates a dire need for a compact ML model architecture, including molecular representation and kernel matrix size, that minimizes model evaluation costs.

show abstract

Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models

Cited by 3 publications

References 23 publications

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

TOFU: Toward Obfuscated Federated Updates by Encoding Weight Updates Into Gradients From Proxy Data

Encrypted machine learning of molecular quantum properties

Contact Info

Product

Resources

About