Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.13
|View full text |Cite
|
Sign up to set email alerts
|

Concealed Data Poisoning Attacks on NLP Models

Abstract: Adversarial attacks alter NLP model predictions by perturbing test-time inputs. However, it is much less understood whether, and how, predictions can be manipulated with small, concealed changes to the training data. In this work, we develop a new data poisoning attack that allows an adversary to control model predictions whenever a desired trigger phrase is present in the input. For instance, we insert 50 poison examples into a sentiment model's training set that causes the model to frequently predict Positiv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
40
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 59 publications
(40 citation statements)
references
References 22 publications
0
40
0
Order By: Relevance
“…However, these cannot be directly applied to recommender systems due to sequential data dependency. Recently, data perturbation attacks [9,34,54] for natural language processing (NLP) have been proposed. However, we cannot employ them directly for our setting since they either are targeted perturbations, have different perturbation levels (e.g., word or embedding modifications), or cannot model the long sequential dependency.…”
Section: Related Workmentioning
confidence: 99%
“…However, these cannot be directly applied to recommender systems due to sequential data dependency. Recently, data perturbation attacks [9,34,54] for natural language processing (NLP) have been proposed. However, we cannot employ them directly for our setting since they either are targeted perturbations, have different perturbation levels (e.g., word or embedding modifications), or cannot model the long sequential dependency.…”
Section: Related Workmentioning
confidence: 99%
“…However, these carefully designed samples can be identified as outliers and filtered [41]. Another popular data poisoning way is generating adversarial samples to subvert the training process [44]. For example, Yang et al [53] proposed to use autoencoder as the generator to create poisoned data, which is further updated based on a reward function of loss.…”
Section: Related Workmentioning
confidence: 99%
“…Data poisoning attacks only perturb adversarial set D adv . Target feature vectors are unperturbed/benign [7,28,45,68]. Clean-label poisoning leaves labels unchanged when crafting D adv from seed instances [77].…”
Section: Threat Modelmentioning
confidence: 99%
“…Targeted training-set attacks manipulate an ML system's prediction on one or more target test instances by maliciously modifying the training data [2,20,27,45,59,60,68]. For example, a retailer may attempt to trick a spam filter into mislabeling all of a competitor's emails as spam [60].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation