We present a comparative study on toxicity detection, focusing on the problem of identifying toxicity types of low prevalence and possibly even unobserved at training time. For this purpose, we train our models on a dataset that contains only a weak type of toxicity, and test whether they are able to generalize to more severe toxicity types. We find that representation learning and ensembling exceed the classification performance of simple classifiers on toxicity detection, while also providing significantly better generalization and robustness. All models benefit from a larger training set size, which even extends to the toxicity types unseen during training.
One-vs-Rest (OVR) classification aims to distinguish a single class of interest (COI) from other classes. The concept of novelty detection and robustness to dataset shift becomes crucial in OVR when the scope of the rest class is extended from the classes observed during training to unseen and possibly unrelated classes, a setting referred to as open set recognition (OSR). In this work, we propose a novel architecture, namely decoupling autoencoder (DAE), which provides a proven upper bound on the open space risk and minimizes open space risk via a dedicated training routine. Our method is benchmarked within three different scenarios, each isolating different aspects of OSR, namely plain classification, outlier detection, and dataset shift. The results conclusively show that DAE achieves robust performance across all three tasks. This level of cross-task robustness is not observed for any of the seven potent baselines from the OSR, OVR, outlier detection, and ensembling domain which, apart from ATA (Lübbering et al., From imbalanced classification to supervised outlier detection problems: adversarially trained auto encoders. In: Artificial neural networks and machine learning—ICANN 2020, 2020), tend to fail on either one of the tasks. Similar to DAE, ATA is based on autoencoders and facilitates the reconstruction error to predict the inlierness of a sample. However unlike DAE, it does not provide any uncertainty scores and therefore lacks rudimentary means of interpretation. Our adversarial robustness and local stability results further support DAE’s superiority in the OSR setting, emphasizing its applicability in safety-critical systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.