Deep anomaly detection methods learn representations that separate between normal and anomalous samples. Very effective representations are obtained when powerful externally trained feature extractors (e.g. ResNets pre-trained on Ima-geNet) are fine-tuned on the training data which consists of normal samples and no anomalies. However, this is a difficult task that can suffer from catastrophic collapse, i.e. it is prone to learning trivial and non-specific features. In this paper, we propose a new loss function which can overcome failure modes of both center-loss and contrastive-loss methods. Furthermore, we combine it with a confidence-invariant angular center loss, which replaces the Euclidean distance used in previous work, that was sensitive to prediction confidence. Our improvements yield a new anomaly detection approach, based on Mean-Shifted Contrastive Loss, which is both more accurate and less sensitive to catastrophic collapse than previous methods. Our method achieves state-of-the-art anomaly detection performance on multiple benchmarks including 97.5% ROC-AUC on the CIFAR-10 dataset 1 .
IntroductionAnomaly detection is a fundamental task for intelligent agents that aims to detect if an observed pattern is normal or anomalous (unusual or unlikely). Anomaly detection has broad applications in scientific and industrial tasks such as detecting new physical phenomena (black holes, supernovae) or genetic mutations, as well as production line inspection and video surveillance. Due to the significance of the task, many efforts have been focused on automatic anomaly detection, particularly on statistical and machine learning methods. A common paradigm used by many anomaly detection methods is measuring the probability of samples and assigning high-probability samples as normal and low-probability samples as anomalous. The quality of the density estimators is closely related to the quality of features used to represent the data. Classical methods used statistical estimators such as K nearest-neighbors (kNN) or Gaussian mixture models (GMMs) on raw features, however this often results in sub-optimal results on high-dimensional data such as images. Many recent methods, learn features in a self-supervised way and use them in order to detect anomalies. Their main weakness is that anomaly detection datasets are typically small and do not include anomalous samples resulting in weak features. An alternative direction, which achieved better results, is to transfer features learned from auxiliary tasks on large-scale external datasets such as ImageNet classification. It was found that fine-tuning the pre-trained features on the normal training data can result in significant performance improvements, however it is quite challenging. The main issue with fine-tuning on one-class classification (OCC) tasks such as anomaly detection is catastrophic collapse i.e. after an initial improvement in efficacy, the features degrade and become uninformative. This phenomenon is caused by trivial solutions allowed by OCC tasks such as t...