Backdoor Pre-trained Models Can Transfer to All

Shen, Lujia; Ji, Shouling; Zhang, Xuhong; Li, Jinfeng; Chen, Jing; Shi, Jie; Fang, Chengfang; Yin, Jianwei; Wang, Ting

doi:10.1145/3460120.3485370

Cited by 45 publications

(47 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Backdoor attacks have also been successfully applied to BMs on NLP tasks. For example, [1035] and [1036] simultaneously propose backdoor attacks on pre-trained NLP models, such that the downstream tasks after fine-tuning can also inherent the backdoor behavior. [1037] propose a weight poisoning approach that the pre-trained weights are injected with vulnerabilities which expose backdoors after fine-tuning.…”

Section: Threats For Big Modelsmentioning

confidence: 99%

“…Recent studies demonstrated adding a few specially-crafted data to the training corpus can manipulate the model, e.g., generating offensive text [1092,1016], wrong translations [1093] or suggesting insecure code [1090]. Moreover, the backdoors in the pretrained language models can impact a wide range of downstream tasks [1035]. It makes the pretrained model a single point of failure for all downstream applications.…”

Section: Securitymentioning

confidence: 99%

See 1 more Smart Citation

A Roadmap for Big Model

Yuan¹,

Zhao²,

Jiahong³

et al. 2022

Preprint

View full text Add to dashboard Cite

domains indexed by Google News. It contains 31 million documents with an average length of 793 BPE tokens. Like C4, it excludes examples with duplicate URLs. News dumps from December 2016 through March 2019 were used as training data, articles published in April 2019 from the April 2019 dump were used for evaluation. OpenWebText2(OWT2). OWT2 is an enhanced version of the original OpenWebTextCorpus, including content from multiple languages, document metadata, multiple dataset versions, and open source replication code, covering all Reddit submissions from 2005 up until April 2020. PubMed Central(PMC). PMC is a free full-text archive of biomedical and life sciences journal literature from the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The dataset is updated daily. In addition to full-text articles, they contain corrections, retractions, and expressions of concern, as well as file lists that include metadata for articles in each dataset.PMC obtained by open registration in Amazon Web Services (AWS) includes The PMC Open Access Subset and The Author Manuscript Dataset. The PMC Open Access Subset includes all articles and preprints in PMC with a machine-readable Creative Commons license that allows reuse. The Author Manuscript Dataset includes accepted author manuscripts collected under a funder policy in PMC and made available in machine-readable formats for text mining. ArXiv. ArXiv is a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more. It provides open access to academic articles, covering many subdisciplines from vast branches of physics to computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics, which is helpful to the potential downstream applications of the research field. In addition, the writing language of LaTeX also contributes to the study of language models. Colossal Clean Crawled Corpus(C4). C4 is a colossal, cleaned version of Common Crawl's web crawl corpus. It is based on Common Crawl dataset and was used to train the T5 text-to-text Transformer models. The cleaned English version of C4 has 364,868,901 training examples and 364,608 validation examples, while the uncleaned English version has 1,063,805,324 training examples and 1,065,029 validation examples; the realnewslike version has 13,799,838 training examples and 13,863 validation examples, while the webtextlike version has 4,500,788 training examples and 4,493 validation examples. Wiki-40B. Wikipedia (Wiki-40B) is a clean-up text collection containing more than 40 Wikipedia language editions of pages corresponding to entities. The dataset is split into train/validation/test sets for each language. The training set has 2,926,536 examples, the validation set has 163,597 examples, and the test set has 162,274 examples. Wiki-40B is cleaned by a page filter to remove ambiguous, redirected, deleted, and non-physical pages. CLUECorpus2020. CLUECorpus2020 ...

show abstract

Section: Threats For Big Modelsmentioning

confidence: 99%

Section: Securitymentioning

confidence: 99%

A Roadmap for Big Model

Yuan¹,

Zhao²,

Jiahong³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Triggers are attacker-specific patterns that activate backdoors. Most backdoor triggers are fixed words [11,15,33,38,44] or sentences [6]. To make triggers invisible, some attackers design syntactic [26] or style [25] triggers, where backdoors activate when input texts of certain syntax or style.…”

Section: Attackmentioning

confidence: 99%

“…Users get a model specifically trained for the task and further tune it on clean datasets. Finally, with the weak assumption that no task-specific knowledge is available, some attackers use plain texts to attack general-purpose PLMs and leave the backdoor to arbitrary downstream tasks [44,33].…”

Section: Accessibilitymentioning

confidence: 99%

A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

Cui¹,

Lifan²,

He³

et al. 2022

Preprint

View full text Add to dashboard Cite

Textual backdoor attacks are a kind of practical threat to NLP systems. By injecting a backdoor in the training phase, the adversary could control model predictions via predefined triggers. As various attack and defense models have been proposed, it is of great significance to perform rigorous evaluations. However, we highlight two issues in previous backdoor learning evaluations: (1) The differences between real-world scenarios (e.g. releasing poisoned datasets or models) are neglected, and we argue that each scenario has its own constraints and concerns, thus requires specific evaluation protocols; (2) The evaluation metrics only consider whether the attacks could flip the models' predictions on poisoned samples and retain performances on benign samples, but ignore that poisoned samples should also be stealthy and semantic-preserving. To address these issues, we categorize existing works into three practical scenarios in which attackers release datasets, pre-trained models, and fine-tuned models respectively, then discuss their unique evaluation methodologies. On metrics, to completely evaluate poisoned samples, we use grammar error increase and perplexity difference for stealthiness, along with text similarity for validity. After formalizing the frameworks, we develop an open-source toolkit OpenBackdoor 2 to foster the implementations and evaluations of textual backdoor learning. With this toolkit, we perform extensive experiments to benchmark attack and defense models under the suggested paradigm. To facilitate the underexplored defenses against poisoned datasets, we further propose CUBE, a simple yet strong clustering-based defense baseline. We hope that our frameworks and benchmarks could serve as the cornerstones for future model development and evaluations. * Equal contribution 2 https://github.com/thunlp/OpenBackdoor Preprint. Under review.

show abstract

“…Therefore, understanding the deciding factors of transferability and their working mechanisms has attracted intensive researches [15], [19], [29], [34], [36], [39], [44]. However, to the best of our knowledge, all systematic empirical studies are conducted under controlled "lab" environments that are too ideal to make the derived conclusions reliable in the real environment.…”

Section: Introductionmentioning

confidence: 99%