Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.161
|View full text |Cite
|
Sign up to set email alerts
|

Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!

Abstract: Natural language processing (NLP) tasks, ranging from text classification to text generation, have been revolutionised by the pretrained language models, such as BERT. This allows corporations to easily build powerful APIs by encapsulating fine-tuned BERT models for downstream tasks. However, when a fine-tuned BERT model is deployed as a service, it may suffer from different attacks launched by the malicious users. In this work, we first present how an adversary can steal a BERT-based API service (the victim/t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2

Relationship

2
6

Authors

Journals

citations
Cited by 36 publications
(19 citation statements)
references
References 26 publications
0
19
0
Order By: Relevance
“…Model extraction attack (MEA) or imitation attack has received significant attention in the past years (Tramèr et al 2016;Correia-Silva et al 2018;Wallace, Stern, and Song 2020;Krishna et al 2020;He et al 2021a;Xu et al 2021). MEA aims to imitate the functionality of a black-box victim model.…”
Section: Model Extraction Attackmentioning
confidence: 99%
See 2 more Smart Citations
“…Model extraction attack (MEA) or imitation attack has received significant attention in the past years (Tramèr et al 2016;Correia-Silva et al 2018;Wallace, Stern, and Song 2020;Krishna et al 2020;He et al 2021a;Xu et al 2021). MEA aims to imitate the functionality of a black-box victim model.…”
Section: Model Extraction Attackmentioning
confidence: 99%
“…V can process customer queries and return the predictions y as its response. Note that y is a predicted label or a probability vector, if T is a classification problem (Krishna et al 2020;Szyller et al 2021;He et al 2021a). If T is a generation task, y can be a sequence of tokens (Wallace, Stern, and Song 2020;Xu et al 2021).…”
Section: Model Extraction Attackmentioning
confidence: 99%
See 1 more Smart Citation
“…Recent works have focused on discussing the severe robustness problem of BLMs, mainly containing two types: (1) Adversarial Attacking, which generates new samples by small perturbation on the original inputs to mislead the BLMs's into making wrong predictions. Current works utilize the model prediction, prediction probabilities, and model gradients of the fine-tuned BLMs to search adversarial examples, from char-level attacking [460], word-level attacking [461,462,463,464,465] , sentence-level attacking [466,467] to multi-level attacking [468,469], showing that the robustness of BLMs to adversarial attacking is still far from perfect; (2) Backdoor Attacking, which inserts instances with specifically designed patterns into training data so that the trained BLMs may perform well on normal samples but behave badly on those samples with these patterns. Existing backdoor attacking works of big models mainly focus on exploring more types of triggers [470], data-free backdoor attacking [471], effectiveness on clean sets [472], effectiveness after fine-tuning [473,474] and stealth attacking [475].…”
Section: Model Analysismentioning
confidence: 99%
“…Unfortunately, previous works have validated that the functionality of a victim API can be stolen through imitation attacks, which inquire the victim with carefully designed queries and train an imitation model based on the outputs of the target API. Such attacks cause severe IP violations of the target API and stifle the creativity and motivation of our research community [44,48,20,12]. Figure 1: Ratio change of word frequency of top 100 words between benign and watermarked corpora used by [13], namely P b (w)/P w (w).…”
Section: Introductionmentioning
confidence: 99%