Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.835
|View full text |Cite
|
Sign up to set email alerts
|

Types of Out-of-Distribution Texts and How to Detect Them

Abstract: Despite agreement on the importance of detecting out-of-distribution (OOD) examples, there is little consensus on the formal definition of OOD examples and how to best detect them. We categorize these examples by whether they exhibit a background shift or a semantic shift, and find that the two major approaches to OOD detection, model calibration and density estimation (language modeling for text), have distinct behavior on these types of OOD data. Across 14 pairs of in-distribution and OOD English natural lan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
49
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 28 publications
(51 citation statements)
references
References 34 publications
2
49
0
Order By: Relevance
“…This might increase the risk of finding artefacts due to changes in the patient population. Third, to identify changes in writing style and writing conventions, one could use methods for out-of-distribution detection such as perplexity of the texts over time 28 . This, however, requires the use of a GPU as well as a generative language model, neither of which were available to us.…”
Section: Limitationsmentioning
confidence: 99%
“…This might increase the risk of finding artefacts due to changes in the patient population. Third, to identify changes in writing style and writing conventions, one could use methods for out-of-distribution detection such as perplexity of the texts over time 28 . This, however, requires the use of a GPU as well as a generative language model, neither of which were available to us.…”
Section: Limitationsmentioning
confidence: 99%
“…Therefore, the model is reasonable to misclassify such data, which is irrelevant to backdoor attacks. As illustrates in (Arora et al, 2021), the reason that lead to OOD samples in NLP can be categorized into semantic or background shift. Following (Arora et al, 2021), we utilize the density estimation (PPL) 1 for OOD text detection.…”
Section: Some Empirical Analysis Of the Gap Between Asr And Asrdmentioning
confidence: 99%
“…As illustrates in (Arora et al, 2021), the reason that lead to OOD samples in NLP can be categorized into semantic or background shift. Following (Arora et al, 2021), we utilize the density estimation (PPL) 1 for OOD text detection. Then we launch OOD detection on the poisoned test set generated by StyAtk.…”
Section: Some Empirical Analysis Of the Gap Between Asr And Asrdmentioning
confidence: 99%
See 1 more Smart Citation
“…Language learning with distribution shift (a.k.a., OOD, short for out-of-distribution) has drawn a growing attention in the NLP community Arora et al, 2021). Most previous work focuses on OOD in different domains (Muandet et al, 2013;Ganin et al, 2015) and studies how to learn generalizable cross-domain features.…”
Section: Introductionmentioning
confidence: 99%