We propose an automated way of determining the optimal number of low-rank components in dimension reduction of image data. The method is based on the combination of twodimensional principal component analysis and an augmentation estimator proposed recently in the literature. Intuitively, the main idea is to combine a scree plot with information extracted from the eigenvectors of a variation matrix. Simulation studies show that the method provides accurate estimates and a demonstration with a finger data set showcases its performance in practice.
Tensor-valued data benefits greatly from dimension reduction as the reduction in size is exponential in the number of modes. To achieve maximal reduction without loss in information, our objective in this work is to give an automated procedure for the optimal selection of the reduced dimensionality. Our approach combines a recently proposed data augmentation procedure with the higher-order singular value decomposition (HOSVD) in a tensorially natural way. We give theoretical guidelines on how to choose the tuning parameters and further inspect their influence in a simulation study. As our primary result, we show that the procedure consistently estimates the true latent dimensions under a noisy tensor model, both at the population and sample levels. Additionally, we propose a bootstrap-based alternative to the augmentation estimator. Simulations are used to demonstrate the estimation accuracy of the two methods under various settings.
Partial orderings and measures of information for continuous univariate random variables with special roles of Gaussian and uniform distributions are discussed. The information measures and measures of non-Gaussianity including third and fourth cumulants are generally used as projection indices in the projection pursuit approach for the independent component analysis. The connections between information, non-Gaussianity and statistical independence in the context of independent component analysis is discussed in detail. Keywords Dispersion • entropy • kurtosis • partial orderings2 Some characteristics of a univariate distribution Location, dispersion, skewness and kurtosisWe consider a continuous random variable x with the finite mean E(x), finite variance V ar(x), density function f and cumulative density function F . Location, dispersion, skewness and kurtosis are often considered by defining the corresponding measures or functionals for these properties. Location and dispersion measures, write T (x) and S(x), are functions of the distribution of x and defined as follows. Definition 2.1.Clearly, if T is a location measure and x is symmetric around µ, then T (x) = µ for all location measures. For squared dispersion measures S 2 , [10] considered the concepts of additivity, subadditivity and superadditivity. These concepts appear to be crucial in developing tools for the independent component analysis and are defined as follows. Definition 2.2. Let S 2 be a squared dispersion measure.
We study the estimation of the linear discriminant with projection pursuit, a method that is unsupervised in the sense that it does not use the class labels in the estimation. Our viewpoint is asymptotic and, as our main contribution, we derive central limit theorems for estimators based on three different projection indices, skewness, kurtosis, and their convex combination. The results show that in each case the limiting covariance matrix is proportional to that of linear discriminant analysis (LDA), a supervised estimator of the discriminant. An extensive comparative study between the asymptotic variances reveals that projection pursuit gets arbitrarily close in efficiency to LDA when the distance between the groups is large enough and their proportions are reasonably balanced. Additionally, we show that consistent unsupervised estimation of the linear discriminant can be achieved also in high-dimensional regimes where the dimension grows at a suitable rate to the sample size, for example, pn = o(n 1/3 ) is sufficient under skewness-based projection pursuit. We conclude with a real data example and a simulation study investigating the validity of the obtained asymptotic formulas for finite samples.
Dimension reduction is a common strategy in multivariate data analysis which seeks a subspace which contains all interesting features needed for the subsequent analysis. Non-Gaussian component analysis attempts for this purpose to divide the data into a non-Gaussian part, the signal, and a Gaussian part, the noise. We will show that the simultaneous use of two scatter functionals can be used for this purpose and suggest a bootstrap test to test the dimension of the non-Gaussian subspace. Sequential application of the test can then for example be used to estimate the signal dimension.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.