Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1398
|View full text |Cite
|
Sign up to set email alerts
|

Self-Attentive, Multi-Context One-Class Classification for Unsupervised Anomaly Detection on Text

Abstract: There exist few text-specific methods for unsupervised anomaly detection, and for those that do exist, none utilize pre-trained models for distributed vector representations of words. In this paper we introduce a new anomaly detection method-Context Vector Data Description (CVDD)-which builds upon word embedding models to learn multiple sentence representations that capture multiple semantic contexts via the self-attention mechanism. Modeling multiple contexts enables us to perform contextual anomaly detection… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
313
0
6

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 162 publications
(320 citation statements)
references
References 31 publications
1
313
0
6
Order By: Relevance
“…Low-level anomalies could be texture defects or artifacts in images, or character typos in words. In comparison, semantic anomalies could be images of objects from nonnormal classes [200], for instance, or misposted reviews and news articles [140]. Note that semantic anomalies can be very close to normal instances in the raw feature space X .…”
Section: ) What Is An Anomaly?mentioning
confidence: 99%
See 1 more Smart Citation
“…Low-level anomalies could be texture defects or artifacts in images, or character typos in words. In comparison, semantic anomalies could be images of objects from nonnormal classes [200], for instance, or misposted reviews and news articles [140]. Note that semantic anomalies can be very close to normal instances in the raw feature space X .…”
Section: ) What Is An Anomaly?mentioning
confidence: 99%
“…A recurring question in deep one-class classification is how to meaningfully regularize against a feature map collapse φω ≡ c. Without regularization, minimum volume or maximum margin objectives, such as (16), (20), or (22), could be trivially solved with a constant mapping [137], [333]. Possible solutions for this include adding a reconstruction term or architectural constraints [137], [327], freezing the embedding [136], [139], [140], [142], [334], inversely penalizing the embedding variance [335], using true [144], [336], auxiliary [139], [233], [332], [337], or artificial [337] negative examples in training, pseudolabeling [152], [153], [155], [335], or integrating some manifold assumption [333]. Further variants of deep one-class classification include multimodal [145] or time-series extensions [338] and methods that employ adversarial learning [138], [141], [339] or transfer learning [139], [142].…”
Section: Deep One-class Classificationmentioning
confidence: 99%
“…We test our solution using two text classification datasets, after stripping headers and other metadata. For the first dataset, 20Newsgroups, we keep the exact setup, splits, and preprocessing (lowercase, removal of: punctuation, number, stop word and short words) as in (Ruff et al, 2019), ensuring a fair comparison with previous text anomaly detection methods. As for the second dataset, we use a significantly larger one, AG News, better suited for deep learning methods.…”
Section: Methodsmentioning
confidence: 99%
“…As for the second dataset, we use a significantly larger one, AG News, better suited for deep learning methods. 1) 20Newsgroups 2 : We only take the articles from six top-level classes: computer, recreation, science, miscellaneous, politics, religion, like in (Ruff et al, 2019). This dataset is relatively small, but a classic for NLP tasks (for each class, there are between 577-2856 samples for training and 382-1909 for validation).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation