Deep learning approaches to anomaly detection (AD) have recently improved the state of the art in detection performance on complex data sets, such as large collections of images or text. These results have sparked a Manuscript
There exist few text-specific methods for unsupervised anomaly detection, and for those that do exist, none utilize pre-trained models for distributed vector representations of words. In this paper we introduce a new anomaly detection method-Context Vector Data Description (CVDD)-which builds upon word embedding models to learn multiple sentence representations that capture multiple semantic contexts via the self-attention mechanism. Modeling multiple contexts enables us to perform contextual anomaly detection of sentences and phrases with respect to the multiple themes and concepts present in an unlabeled text corpus. These contexts in combination with the self-attention weights make our method highly interpretable. We demonstrate the effectiveness of CVDD quantitatively as well as qualitatively on the wellknown Reuters, 20 Newsgroups, and IMDB Movie Reviews datasets.
Activity coefficients, which are a measure of the non-ideality of liquid mixtures, are a key property in chemical engineering with relevance to modeling chemical and phase equilibria as well as transport processes. Although experimental data on thousands of binary mixtures are available, prediction methods are needed to calculate the activity coefficients in many relevant mixtures that have not been explored to-date. In this report, we propose a probabilistic matrix factorization model for predicting the activity coefficients in arbitrary binary mixtures. Although no physical descriptors for the considered components were used, our method outperforms the state-of-the-art method that has been refined over three decades while requiring much less training effort. This opens perspectives to novel methods for predicting physico-chemical properties of binary mixtures with the potential to revolutionize modeling and simulation in chemical engineering. Activity Coefficients at Infinite Dilution Solutes SolventsThis document is the unedited authors' version of a submitted work that was subsequently accepted for publication in TheIn this work, we describe a novel application of Machine Learning (ML) to the field of physical chemistry and thermodynamics: the prediction of physico-chemical properties of binary liquid mixtures by matrix completion. We focus on the prediction of a single property: the so-called activity coefficient, which is a measure of the non-ideality of a liquid mixture and of enormous relevance in practice. The interesting aspect of our approach is that no expert knowledge about the components that make up the mixture was used: all we needed was an incomplete, sparse data set of binary mixtures and their measured activity coefficients that our method was able to successfully complete. We show that this simple approach outperforms an established procedure that has been the state of the art for several decades.ML approaches to chemical and engineering sciences date back more than 50 years ago, but the genuine exploitation of the potential of ML in these fields has only recently begun 1 . An overview of recent advances in chemical and material sciences has, e.g., been given by Ramprasad et al. 2 and Butler et al. 3 ML has already been used to predict physico-chemical properties of mixtures, including activity coefficients 4-10 . Most of these approaches are basically quantitative structureproperty relationships (QSPR) methods 11 that use physical descriptors of the components as input data to characterize the considered mixtures and relate them to the property of interest by an ML algorithm, e.g., a neural network. However, the scope of these approaches is in general rather small.Binary mixtures are of fundamental importance in chemical engineering. The properties of mixtures can in general not be described based on properties of the pure components alone. If, however, the respective properties of the binary constituent 'sub-mixtures' of a multi-component mixture are known, the properties of the multi...
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.