2021
DOI: 10.48550/arxiv.2112.08313
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Measure and Improve Robustness in NLP Models: A Survey

Abstract: As NLP models achieved state-of-the-art performances over benchmarks and gained wide applications, it has been increasingly important to ensure the safe deployment of these models in the real world, e.g., making sure the models are robust against unseen or challenging scenarios. Despite robustness being an increasingly studied topic, it has been separately explored in applications like vision and NLP, with various definitions, evaluation and mitigation strategies in multiple lines of research. In this paper, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 71 publications
0
9
0
Order By: Relevance
“…typos) that cause most current systems to significantly degrade (Szegedy et al, 2014;Goodfellow et al, 2015;Jia and Liang, 2017;Belinkov and Bisk, 2018;Madry et al, 2018;Ribeiro et al, 2020;Santurkar et al, 2020;Tsipras, 2021;. Thus, in order to better capture the performance of these models in practice, we need to expand our evaluation beyond the exact instances contained in our scenarios (Jia and Liang, 2017;Goel et al, 2021;Wang et al, 2021b).…”
Section: Robustnessmentioning
confidence: 99%
“…typos) that cause most current systems to significantly degrade (Szegedy et al, 2014;Goodfellow et al, 2015;Jia and Liang, 2017;Belinkov and Bisk, 2018;Madry et al, 2018;Ribeiro et al, 2020;Santurkar et al, 2020;Tsipras, 2021;. Thus, in order to better capture the performance of these models in practice, we need to expand our evaluation beyond the exact instances contained in our scenarios (Jia and Liang, 2017;Goel et al, 2021;Wang et al, 2021b).…”
Section: Robustnessmentioning
confidence: 99%
“…The NLP community has also examined voice assistant failures from a slightly different angle, focusing on the robustness of different NLP components underlying voice assistants, such as models for tasks in natural language inference [44], question answering [26,42], and speech recognition [32]. NLP robustness can be defined as understanding how model performance changes when testing on a new dataset, which has a different distribution from the dataset the model is trained on [62]. In practice, users' real world interactions with voice assistants could differ from data used in development, which mimics the data distribution shift in NLP robustness research.…”
Section: Nlp Approaches To Voice Assistant Failuresmentioning
confidence: 99%
“…Other works inspired by Hendrycks et al [2021b] focus on distribution shift based on changes of grammar errors, dialects, speakers, and language Demszky et al [2020], different domains Miller et al [2020] and bias De-Arteaga et al [2019], Prates et al [2020]. The image robustness research space has inspired many of these studies, but there are vast differences to NLP and vision that make these transfers difficult, such as the discrete vs. continuous search space as explained in Wang et al [2021b]. Data augmentation has also been looked at as a method to improve robustness and has shown substantial improvements Feng et al [2021], Dhole et al [2021], , , Chen et al [2021].…”
Section: Robustnessmentioning
confidence: 99%