Characterization of MPC-based Private Inference for Transformer-based Models

Wang, Yongqin; Suh, G. Edward; Xiong, Wei; Lefaudeux, Benjamin; Knott, Brian; Annavaram, Murali; Lee, Hsien-Hsin S.

doi:10.1109/ispass55109.2022.00025

Cited by 6 publications

(3 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Nevertheless, according to our experiences with machine learning inference, for plaintext insecure inferences using single-precision floating points, inputs to exponential functions are small so that the x max can be omitted. However, according to a recent study for MPC Transformer-based models [34], due to MPC numerical limitations and linear approximations, using the x max is vital to model performance accuracy. Omitting or replacing the x max to stabilize exponential functions can destroy model accuracy.…”

Section: F Numerical Instability In the Softmaxmentioning

confidence: 99%

See 1 more Smart Citation

MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

Wang¹,

Rajat²,

Annavaram³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Multi-party computing (MPC) has been gaining popularity over the past years as a secure computing model, particularly for machine learning (ML) inference. Compared with its competitors, MPC has fewer overheads than homomorphic encryption (HE) and has a more robust threat model than hardware-based trusted execution environments (TEE) such as Intel SGX. Despite its apparent advantages, MPC protocols still pay substantial performance penalties compared to plaintext when applied to ML algorithms. The overhead is due to added computation and communication costs. For multiplications which are ubiquitous in ML algorithms, MPC protocols add 32x more computational costs and 1 round of broadcasting among MPC servers. Moreover, ML computations that have trivial costs in plaintext, such as Softmax, ReLU and other non-linear operations become very expensive due to added communication. Those added overheads make MPC less palatable to deploy in realtime ML inference frameworks, such as speech translation.In our studies, we found that most MPC protocols today perform communications and computations in sequential manner. This serialization is not a poor implementation choice, but a requirement for MPC to work correctly. Without the data communication the parties cannot progress to the next computation step. Thus GPU servers that are parties in an MPC setting are idle when waiting for data transmission to complete. During communication phase, GPU utilization is low. This phenomenon inspires us to enable MPC servers to perform computations and communications concurrently through a series of novel MPCabiding computation transformation. In this work we present MPC-Pipe, an MPC pipeline inference technique that uses two ML-specific approaches. 1) inter-linear-layer pipeline and 2) inner layer pipeline. The first scheme benefits linear layers by transmitting input-independent MPC metadata beforehand, and the second benefits non-linear layers by breaking big inputs into smaller ones to overlap communications and computations. Those two techniques combined shorten the total inference runtime for machine learning models. Our experiments have shown to reduce ML inference latency by up to 12.6% when model weights are private and 14.48% when model weights are public, compared to current MPC protocol implementations.

show abstract

Section: F Numerical Instability In the Softmaxmentioning

confidence: 99%

“…Sphynx [3], DeepReduce [14], and Circa [9] has proposed optimizations to optimize MPC CNNs. [34] is an extensive study on MPC inference of Transformer-based models and urges for optimizations for MPC Softmax.…”

Section: B Mpc Operation Optimizationsmentioning

confidence: 99%

MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

Wang¹,

Rajat²,

Annavaram³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…There are various algorithmic approaches to protect data privacy, such as Homomorphic Encryption libraries [23,25,39,45], Secure Multi-Party Computing (MPC) [53,54,70,74,79,81], Differential Privacy [1,19,75], Noise Injection [20,47,48], and using Trusted Execution Enviroments [57,77]. Each of these methods provides a different privacy guarantee and comes at different cost [49], as we explain in the next section.…”

mentioning

confidence: 99%

DarKnight: An Accelerated Framework for Privacy and Integrity Preserving Deep Learning Using Trusted Hardware

Hashemi

Wang

Annavaram

2021

MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Self Cite

View full text Add to dashboard Cite

Privacy and security-related concerns are growing as machine learning reaches diverse application domains. The data holders want to train or infer with private data while exploiting accelerators, such as GPUs, that are hosted in the cloud. Cloud systems are vulnerable to attackers that compromise the privacy of data and integrity of computations. Tackling such a challenge requires unifying theoretical privacy algorithms with hardware security capabilities. This paper presents DarKnight, a framework for large DNN training while protecting input privacy and computation integrity. DarKnight relies on cooperative execution between trusted execution environments (TEE) and accelerators, where the TEE provides privacy and integrity verification, while accelerators perform the bulk of the linear algebraic computation to optimize the performance. In particular, DarKnight uses a customized data encoding strategy based on matrix masking to create input obfuscation within a TEE. The obfuscated data is then offloaded to GPUs for fast linear algebraic computation. DarKnight's data obfuscation strategy provides provable data privacy and computation integrity in the cloud servers. While prior works tackle inference privacy and cannot be utilized for training, DarKnight's encoding scheme is designed to support both training and inference.

show abstract

MLFormer: a high performance MPC linear inference framework for transformers

Liu,

Chen

et al. 2024

J Cryptogr Eng

View full text Add to dashboard Cite

Characterization of MPC-based Private Inference for Transformer-based Models

Cited by 6 publications

References 8 publications

MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

DarKnight: An Accelerated Framework for Privacy and Integrity Preserving Deep Learning Using Trusted Hardware

MLFormer: a high performance MPC linear inference framework for transformers

Contact Info

Product

Resources

About