Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2022
DOI: 10.18653/v1/2022.naacl-main.278
|View full text |Cite
|
Sign up to set email alerts
|

Can Rationalization Improve Robustness?

Abstract: A growing line of work has investigated the development of neural NLP models that can produce rationales-subsets of input that can explain their model predictions. In this paper, we ask whether such rationale models can provide robustness to adversarial attacks in addition to their interpretable nature. Since these models need to first generate rationales ("rationalizer") before making predictions ("predictor"), they have the potential to ignore noise or adversarially added text by simply masking it out of the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(8 citation statements)
references
References 3 publications
0
8
0
Order By: Relevance
“…Furthermore, hARs can teach ML models "valid reasons" for a classification, reducing spurious ML model behavior (Mathew et al, 2021;Chen et al, 2022;Joshi et al, 2022) and improving out-of-domain (OOD) performance (Lu et al, 2022).…”
Section: Collection Aims and Benefitsmentioning
confidence: 99%
“…Furthermore, hARs can teach ML models "valid reasons" for a classification, reducing spurious ML model behavior (Mathew et al, 2021;Chen et al, 2022;Joshi et al, 2022) and improving out-of-domain (OOD) performance (Lu et al, 2022).…”
Section: Collection Aims and Benefitsmentioning
confidence: 99%
“…This may also improve the model's robustness, as it may reduce the dependence on cues from one dataset that do not generalise to other datasets. While similar ideas are also applied to attribution methods (Chen et al 2022), the freer form of natural language explanations may extend the benefits to more tasks.…”
Section: Natural Language Explanationsmentioning
confidence: 99%
“…Where our study differs from most previous work is in using feature feedback for adversarial rather than out-ofdomain robustness. A concurrent work by Chen et al (2022) uses rationalization to improve robustness. The proposed method is similar to our work, but we explore supervision with attack tokens and achieve stronger robustness to additive attacks.…”
Section: Related Workmentioning
confidence: 99%