Independent component analysis (ICA) decomposes multivariate data into mutually independent components (ICs). The ICA model is subject to a constraint that at most one of these components is Gaussian, which is required for model identifiability. Linear non-Gaussian component analysis (LNGCA) generalizes the ICA model to a linear latent factor model with any number of both non-Gaussian components (signals) and Gaussian components (noise), where observations are linear combinations of independent components. Although the individual Gaussian components are not identifiable, the Gaussian subspace is identifiable. We introduce an estimator along with its optimization approach in which non-Gaussian and Gaussian components are estimated simultaneously, maximizing the discrepancy of each non-Gaussian component from Gaussianity while minimizing the discrepancy of each Gaussian component from Gaussianity. When the number of non-Gaussian components is unknown, we develop a statistical test to determine it based on resampling and the discrepancy of estimated components. Through a variety of simulation studies, we demonstrate the improvements of our estimator over competing estimators, and we illustrate the effectiveness of our test to determine the number of non-Gaussian components. Further, we apply our method to real data examples and show its practical value.
KEYWORDSdimension reduction, hypothesis testing, independent component analysis, multivariate analysis, projection pursuit, subspace estimation 1 −1∕2 Y be an uncorrelating matrix. Let Z = HY = (Z 1 , … , Z p ) T ∈ R p be a random vector of uncorrelated observations, such that Σ Z = Cov(Z) = I p , the p × p identity matrix. The ICA model further assumes that the components X 1 , … , X p are mutually independent, in which the number of Gaussian components is at most one. Then the Stat Anal Data Min: The ASA Data Sci Journal. 2019;12:141-156 wileyonlinelibrary.com/sam