Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!

He, Xuanli; Lyu, Lingjuan; Sun, Lichao; Xu, Qiongkai

doi:10.18653/v1/2021.naacl-main.161

Cited by 36 publications

(19 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Model extraction attack (MEA) or imitation attack has received significant attention in the past years (Tramèr et al 2016;Correia-Silva et al 2018;Wallace, Stern, and Song 2020;Krishna et al 2020;He et al 2021a;Xu et al 2021). MEA aims to imitate the functionality of a black-box victim model.…”

Section: Model Extraction Attackmentioning

confidence: 99%

“…V can process customer queries and return the predictions y as its response. Note that y is a predicted label or a probability vector, if T is a classification problem (Krishna et al 2020;Szyller et al 2021;He et al 2021a). If T is a generation task, y can be a sequence of tokens (Wallace, Stern, and Song 2020;Xu et al 2021).…”

Section: Model Extraction Attackmentioning

confidence: 99%

“…As a byproduct of the Machine-learning-as-a-service (MLaaS) paradigm, it is believed that companies could prevent customers from redistributing models to illegitimate users. Nevertheless, a series of emerging model extraction attacks have validated that the functionality of the victim API can be stolen with carefully-designed queries, causing IP infringement (Tramèr et al 2016;Wallace, Stern, and Song 2020;Krishna et al 2020;He et al 2021a). Such attacks have been demonstrated to be effective on not only laboratory models, but also commercial APIs (Wallace, Stern, and Song 2020;Xu et al 2021).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Protecting Intellectual Property of Language Generation APIs with Lexical Watermark

Lyu

et al. 2022

AAAI

Self Cite

View full text Add to dashboard Cite

Nowadays, due to the breakthrough in natural language generation (NLG), including machine translation, document summarization, image captioning, etc NLG models have been encapsulated in cloud APIs to serve over half a billion people worldwide and process over one hundred billion word generations per day. Thus, NLG APIs have already become essential profitable services in many commercial companies. Due to the substantial financial and intellectual investments, service providers adopt a pay-as-you-use policy to promote sustainable market growth. However, recent works have shown that cloud platforms suffer from financial losses imposed by model extraction attacks, which aim to imitate the functionality and utility of the victim services, thus violating the intellectual property (IP) of cloud APIs. This work targets at protecting IP of NLG APIs by identifying the attackers who have utilized watermarked responses from the victim NLG APIs. However, most existing watermarking techniques are not directly amenable for IP protection of NLG APIs. To bridge this gap, we first present a novel watermarking method for text generation APIs by conducting lexical modification to the original outputs. Compared with the competitive baselines, our watermark approach achieves better identifiable performance in terms of p-value, with fewer semantic losses. In addition, our watermarks are more understandable and intuitive to humans than the baselines. Finally, the empirical studies show our approach is also applicable to queries from different domains, and is effective on the attacker trained on a mixture of the corpus which includes less than 10% watermarked samples.

show abstract

Section: Model Extraction Attackmentioning

confidence: 99%

Section: Model Extraction Attackmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Protecting Intellectual Property of Language Generation APIs with Lexical Watermark

Lyu

et al. 2022

AAAI

Self Cite

View full text Add to dashboard Cite

show abstract

“…Recent works have focused on discussing the severe robustness problem of BLMs, mainly containing two types: (1) Adversarial Attacking, which generates new samples by small perturbation on the original inputs to mislead the BLMs's into making wrong predictions. Current works utilize the model prediction, prediction probabilities, and model gradients of the fine-tuned BLMs to search adversarial examples, from char-level attacking [460], word-level attacking [461,462,463,464,465] , sentence-level attacking [466,467] to multi-level attacking [468,469], showing that the robustness of BLMs to adversarial attacking is still far from perfect; (2) Backdoor Attacking, which inserts instances with specifically designed patterns into training data so that the trained BLMs may perform well on normal samples but behave badly on those samples with these patterns. Existing backdoor attacking works of big models mainly focus on exploring more types of triggers [470], data-free backdoor attacking [471], effectiveness on clean sets [472], effectiveness after fine-tuning [473,474] and stealth attacking [475].…”

Section: Model Analysismentioning

confidence: 99%

A Roadmap for Big Model

Yuan¹,

Zhao²,

Jiahong³

et al. 2022

Preprint

View full text Add to dashboard Cite

domains indexed by Google News. It contains 31 million documents with an average length of 793 BPE tokens. Like C4, it excludes examples with duplicate URLs. News dumps from December 2016 through March 2019 were used as training data, articles published in April 2019 from the April 2019 dump were used for evaluation. OpenWebText2(OWT2). OWT2 is an enhanced version of the original OpenWebTextCorpus, including content from multiple languages, document metadata, multiple dataset versions, and open source replication code, covering all Reddit submissions from 2005 up until April 2020. PubMed Central(PMC). PMC is a free full-text archive of biomedical and life sciences journal literature from the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The dataset is updated daily. In addition to full-text articles, they contain corrections, retractions, and expressions of concern, as well as file lists that include metadata for articles in each dataset.PMC obtained by open registration in Amazon Web Services (AWS) includes The PMC Open Access Subset and The Author Manuscript Dataset. The PMC Open Access Subset includes all articles and preprints in PMC with a machine-readable Creative Commons license that allows reuse. The Author Manuscript Dataset includes accepted author manuscripts collected under a funder policy in PMC and made available in machine-readable formats for text mining. ArXiv. ArXiv is a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more. It provides open access to academic articles, covering many subdisciplines from vast branches of physics to computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics, which is helpful to the potential downstream applications of the research field. In addition, the writing language of LaTeX also contributes to the study of language models. Colossal Clean Crawled Corpus(C4). C4 is a colossal, cleaned version of Common Crawl's web crawl corpus. It is based on Common Crawl dataset and was used to train the T5 text-to-text Transformer models. The cleaned English version of C4 has 364,868,901 training examples and 364,608 validation examples, while the uncleaned English version has 1,063,805,324 training examples and 1,065,029 validation examples; the realnewslike version has 13,799,838 training examples and 13,863 validation examples, while the webtextlike version has 4,500,788 training examples and 4,493 validation examples. Wiki-40B. Wikipedia (Wiki-40B) is a clean-up text collection containing more than 40 Wikipedia language editions of pages corresponding to entities. The dataset is split into train/validation/test sets for each language. The training set has 2,926,536 examples, the validation set has 163,597 examples, and the test set has 162,274 examples. Wiki-40B is cleaned by a page filter to remove ambiguous, redirected, deleted, and non-physical pages. CLUECorpus2020. CLUECorpus2020 ...

show abstract

“…Unfortunately, previous works have validated that the functionality of a victim API can be stolen through imitation attacks, which inquire the victim with carefully designed queries and train an imitation model based on the outputs of the target API. Such attacks cause severe IP violations of the target API and stifle the creativity and motivation of our research community [44,48,20,12]. Figure 1: Ratio change of word frequency of top 100 words between benign and watermarked corpora used by [13], namely P b (w)/P w (w).…”

Section: Introductionmentioning

confidence: 99%

CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks

He¹,

Xu²,

Zeng³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Previous works have validated that text generation APIs can be stolen through imitation attacks, causing IP violations. In order to protect the IP of text generation APIs, a recent work has introduced a watermarking algorithm and utilized the null-hypothesis test as a post-hoc ownership verification on the imitation models. However, we find that it is possible to detect those watermarks via sufficient statistics of the frequencies of candidate watermarking words. To address this drawback, in this paper, we propose a novel Conditional wATERmarking framework (CATER) for protecting the IP of text generation APIs. An optimization method is proposed to decide the watermarking rules that can minimize the distortion of overall word distributions while maximizing the change of conditional word selections. Theoretically, we prove that it is infeasible for even the savviest attacker (they know how CATER works) to reveal the used watermarks from a large pool of potential word pairs based on statistical inspection. Empirically, we observe that high-order conditions lead to an exponential growth of suspicious (unused) watermarks, making our crafted watermarks more stealthy. In addition, CATER can effectively identify the IP infringement under architectural mismatch and cross-domain imitation attacks, with negligible impairments on the generation quality of victim APIs. We envision our work as a milestone for stealthily protecting the IP of text generation APIs.

show abstract

Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!

Cited by 36 publications

References 26 publications

Protecting Intellectual Property of Language Generation APIs with Lexical Watermark

Protecting Intellectual Property of Language Generation APIs with Lexical Watermark

A Roadmap for Big Model

CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks

Contact Info

Product

Resources

About