2022
DOI: 10.1609/aaai.v36i10.21384
|View full text |Cite
|
Sign up to set email alerts
|

MINIMAL: Mining Models for Universal Adversarial Triggers

Abstract: It is well known that natural language models are vulnerable to adversarial attacks, which are mostly input-specific in nature. Recently, it has been shown that there also exist input-agnostic attacks in NLP models, called universal adversarial triggers. However, existing methods to craft universal triggers are data intensive. They require large amounts of data samples to generate adversarial triggers, which are typically inaccessible by attackers. For instance, previous works take 3000 data samples per class … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 20 publications
0
1
0
Order By: Relevance
“…Further, we show the performance of transfer attacks across prompts and find that ≈ 80% of them transfer, thus showing that the adversaries are easily domain adaptable and transfer well across prompts 11 . We choose to use universal adversarial triggers for this task since they are input-agnostic, consist of a small number of tokens, and since they do not require the model's white box access for every essay sample (Singla et al, 2022b), they have the potential of being used as "cheat-codes" where a code once extracted can be used by every test-taker. Our results show that the triggers are highly effective.…”
Section: Aes Oversensitivitymentioning
confidence: 99%
“…Further, we show the performance of transfer attacks across prompts and find that ≈ 80% of them transfer, thus showing that the adversaries are easily domain adaptable and transfer well across prompts 11 . We choose to use universal adversarial triggers for this task since they are input-agnostic, consist of a small number of tokens, and since they do not require the model's white box access for every essay sample (Singla et al, 2022b), they have the potential of being used as "cheat-codes" where a code once extracted can be used by every test-taker. Our results show that the triggers are highly effective.…”
Section: Aes Oversensitivitymentioning
confidence: 99%