Feature selection, which is important for successful analysis of chemometric data, aims to produce parsimonious and predictive models. Partial least squares (PLS) regression is one of the main methods in chemometrics for analyzing multivariate data with input X and response Y by modeling the covariance structure in the X and Y spaces. Recently, orthogonal projections to latent structures (OPLS) has been widely used in processing multivariate data because OPLS improves the interpretability of PLS models by removing systematic variation in the X space not correlated to Y. The purpose of this paper is to present a feature selection method of multivariate data through orthogonal PLS regression (OPLSR), which combines orthogonal signal correction with PLS. The presented method generates empirical distributions of features effects upon Y in OPLSR vectors via permutation tests and examines the significance of the effects of the input features on Y. We show the performance of the proposed method using a simulation study in which a three-layer network structure exists in compared with the false discovery rate method. To demonstrate this method, we apply it to both real-life NIR spectra data and mass spectrometry data.
Entering a new era of big data, analysis of large amounts of real-time data is important, and air quality data as streaming time series are measured by several different sensors. To this end, numerous methods for time-series forecasting and deep-learning approaches based on neural networks have been used. However, they usually rely on a certain model with a stationary condition, and there are few studies of real-time prediction of dynamic massive multivariate data. Use of a variety of independent variables included in the data is important to improve forecasting performance. In this paper, we proposed a real-time prediction approach based on an ensemble method for multivariate time-series data. The suggested method can select multivariate time-series variables and incorporate real-time updatable autoregressive models in terms of performance. We verified the proposed model using simulated data and applied it to predict air quality measured by five sensors and failures based on real-time performance log data in server systems. We found that the proposed method for air pollution prediction showed effective and stable performance for both short- and long-term prediction tasks. In addition, traditional methods for abnormality detection have focused on present status of objects as either normal or abnormal based on provided data, we protectively predict expected statuses of objects with provided real-time data and implement effective system management in cloud environments through the proposed method.
In natural language processing (NLP), Transformer is widely used and has reached the state-of-the-art level in numerous NLP tasks such as language modeling, summarization, and classification. Moreover, a variational autoencoder (VAE) is an efficient generative model in representation learning, combining deep learning with statistical inference in encoded representations. However, the use of VAE in natural language processing often brings forth practical difficulties such as a posterior collapse, also known as Kullback–Leibler (KL) vanishing. To mitigate this problem, while taking advantage of the parallelization of language data processing, we propose a new language representation model as the integration of two seemingly different deep learning models, which is a Transformer model solely coupled with a variational autoencoder. We compare the proposed model with previous works, such as a VAE connected with a recurrent neural network (RNN). Our experiments with four real-life datasets show that implementation with KL annealing mitigates posterior collapses. The results also show that the proposed Transformer model outperforms RNN-based models in reconstruction and representation learning, and that the encoded representations of the proposed model are more informative than other tested models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.