Principal component analysis (PCA), the most popular dimension-reduction technique, has been used to analyze high-dimensional data in many areas. It discovers the homogeneity within the data and creates a reduced feature space to capture as much information as possible from the original data. However, in the presence of a group structure of the data, PCA often fails to identify the group-specific pattern, which is known as sub-homogeneity in this study. Group-specific information that is missed can result in an unsatisfactory representation of the data from a particular group. It is important to capture both homogeneity and sub-homogeneity in high-dimensional data analysis, but this poses a great challenge. In this study, we propose a novel iterative complement-clustering principal component analysis (CPCA) to iteratively estimate the homogeneity and sub-homogeneity. A principal component regression based clustering method is also introduced to provide reliable information about clusters. Theoretically, this study shows that our proposed clustering approach can correctly identify the cluster membership under certain conditions. The simulation study and real analysis of the stock return data confirm the superior performance of our proposed methods. Supplementary materials, including R codes, for the simulation and real data analysis are available online.
High-dimensional autocovariance matrices play an important role in dimension reduction for high-dimensional time series. In this article, we establish the central limit theorem (CLT) for spiked eigenvalues of high-dimensional sample autocovariance matrices, which are developed under general conditions. The spiked eigenvalues are allowed to go to infinity in a flexible way without restrictions in divergence order.Moreover, the number of spiked eigenvalues and the time lag of the autocovariance matrix under this study could be either fixed or tending to infinity when the dimension p and the time length T go to infinity together. As a further statistical application, a novel autocovariance test is proposed to detect the equivalence of spiked eigenvalues for two high-dimensional time series. Various simulation studies are illustrated to justify the theoretical findings. Furthermore, a hierarchical clustering approach based on the autocovariance test is constructed and applied to clustering mortality data from multiple countries.
This paper proposes a new AR-sieve bootstrap approach on high-dimensional time series. The major challenge of classical bootstrap methods on high-dimensional time series is twofold: the curse dimensionality and temporal dependence. To tackle such difficulty, we utilise factor modelling to reduce dimension and capture temporal dependence simultaneously. A factor-based bootstrap procedure is constructed, which conducts AR-sieve bootstrap on the extracted low-dimensional common factor time series and then recovers the bootstrap samples for original data from the factor model. Asymptotic properties for bootstrap mean statistics and extreme eigenvalues are established. Various simulations further demonstrate the advantages of the new AR-sieve bootstrap under high-dimensional scenarios. Finally, an empirical application on particulate matter (PM) concentration data is studied, where bootstrap confidence intervals for mean vectors and autocovariance matrices are provided.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.