a b s t r a c tIn this paper, we propose a new methodology to deal with PCA in high-dimension, low-sample-size (HDLSS) data situations. We give an idea of estimating eigenvalues via singular values of a cross data matrix. We provide consistency properties of the eigenvalue estimation as well as its limiting distribution when the dimension d and the sample size n both grow to infinity in such a way that n is much lower than d. We apply the new methodology to estimating PC directions and PC scores in HDLSS data situations. We give an application of the findings in this paper to a mixture model to classify a dataset into two clusters. We demonstrate how the new methodology performs by using HDLSS data from a microarray study of prostate cancer.
In this paper, we investigate both sample eigenvalues and Principal Component (PC) directions along with PC scores when the dimension d and the sample size n both grow to infinity in such a way that n is much lower than d. We consider general settings that include the case when the eigenvalues are all in the range of sphericity. We do not assume either the normality or a ρ-mixing condition. We attempt finding a difference among the eigenvalues by choosing n with a suitable order of d. We give the consistency properties for both the sample eigenvalues and the PC directions along with the PC scores. We also show that the sample eigenvalue has a Gaussian limiting distribution when the population counterpart is of multiplicity one.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.