2021
DOI: 10.48550/arxiv.2109.06176
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TREATED:Towards Universal Defense against Textual Adversarial Attacks

Bin Zhu,
Zhaoquan Gu,
Le Wang
et al.

Abstract: Recent work shows that deep neural networks are vulnerable to adversarial examples. Much work studies adversarial example generation, while very little work focuses on more critical adversarial defense. Existing adversarial detection methods usually make assumptions about the adversarial example and attack method (e.g., the word frequency of the adversarial example, the perturbation level of the attack method). However, this limits the applicability of the detection method. To this end, we propose TREATED, a u… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…FGWS [39]: Recognizing word substitutions via frequency disparities between original words and substitutes, the method substitutes infrequent terms with more prevalent synonyms, marking a sample as adversarial if prediction shifts surpass set limits. Furthermore, two restorations baselines were also considered: TREATED [42]: Presents a universal defense strategy named TREATED, which leverages multiple reference models to differentiate predictions between original and adversarial data. Adversarial instances, when identified, are restricted from entering the classification model and are instead utilized for adversarial training to enhance model robustness.…”
Section: E Baseline Defense Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…FGWS [39]: Recognizing word substitutions via frequency disparities between original words and substitutes, the method substitutes infrequent terms with more prevalent synonyms, marking a sample as adversarial if prediction shifts surpass set limits. Furthermore, two restorations baselines were also considered: TREATED [42]: Presents a universal defense strategy named TREATED, which leverages multiple reference models to differentiate predictions between original and adversarial data. Adversarial instances, when identified, are restricted from entering the classification model and are instead utilized for adversarial training to enhance model robustness.…”
Section: E Baseline Defense Methodsmentioning
confidence: 99%
“…Building upon ScRNN, ScRNN with Fallbacks [41] offers mechanisms to handle 'unknown' words by either leaving them as is, substituting with a neutral term, or turning to an extensive word recognition model. TREATED [42] stands out by defending against universal disturbances without assumptions, relying on multiple reference models to predict on both clean and adversarial samples. The consistency of these models across datasets is its key strength.…”
Section: B Defensementioning
confidence: 99%
“…They further proposed defense strategies by detection of hacked inputs and output correct results and preserving the correct input and giving its output. In a different line, the work Zhu et al [139] proposed a universal perturbation detection method, TREATED to defend against various perturbation levels without making any priory assumptions. They utilized several reference models to make different predictions about clean and adversarial examples and block them if found adversarial.…”
Section: Perturbation Identification and Correctionmentioning
confidence: 99%