This paper considers the problem of estimating a high-dimensional (HD) covariance matrix when the sample size is smaller, or not much larger, than the dimensionality of the data, which could potentially be very large. We develop a regularized sample covariance matrix (RSCM) estimator which can be applied in commonly occurring sparse data problems. The proposed RSCM estimator is based on estimators of the unknown optimal (oracle) shrinkage parameters that yield the minimum mean squared error (MMSE) between the RSCM and the true covariance matrix when the data is sampled from an unspecified elliptically symmetric distribution. We propose two variants of the RSCM estimator which differ in the approach in which they estimate the underlying sphericity parameter involved in the theoretical optimal shrinkage parameter. The performance of the proposed RSCM estimators are evaluated with numerical simulation studies. In particular when the sample sizes are low, the proposed RSCM estimators often show a significant improvement over the conventional RSCM estimator by Ledoit and Wolf (2004). We further evaluate the performance of the proposed estimators in classification and portfolio optimization problems with real data wherein the proposed methods are able to outperform the benchmark methods.
The estimation of covariance matrices of multiple classes with limited training data is a difficult problem. The sample covariance matrix (SCM) is known to perform poorly when the number of variables is large compared to the available number of samples. In order to reduce the mean squared error (MSE) of the SCM, regularized (shrinkage) SCM estimators are often used. In this work, we consider regularized SCM (RSCM) estimators for multiclass problems that couple together two different target matrices for regularization: the pooled (average) SCM of the classes and the scaled identity matrix. Regularization toward the pooled SCM is beneficial when the population covariances are similar, whereas regularization toward the identity matrix guarantees that the estimators are positive definite. We derive the MSE optimal tuning parameters for the estimators as well as propose a method for their estimation under the assumption that the class populations follow (unspecified) elliptical distributions with finite fourth-order moments. The MSE performance of the proposed coupled RSCMs are evaluated with simulations and in a regularized discriminant analysis (RDA) classification set-up on real data. The results based on three different real data sets indicate comparable performance to cross-validation but with a significant speed-up in computation time.
The paper considers the problem of estimating the covariance matrices of multiple classes in a low sample support condition, where the data dimensionality is comparable to, or larger than, the sample sizes of the available data sets. In such conditions, a common approach is to shrink the class sample covariance matrices (SCMs) towards the pooled SCM. The success of this approach hinges upon the ability to choose the optimal regularization parameter. Typically, a common regularization level is shared among the classes and determined via a procedure based on cross-validation. We use class-specific regularization levels since this enables the derivation of the optimal regularization parameter for each class in terms of the minimum mean squared error (MMSE). The optimal parameters depend on the true unknown class population covariances. Consistent estimators of the parameters can, however, be easily constructed under the assumption that the class populations follow (unspecified) elliptically symmetric distributions. We demonstrate the performance of the proposed method via a simulation study as well as via an application to discriminant analysis using both synthetic and real data sets.
We consider the problem of estimating highdimensional covariance matrices of K-populations or classes in the setting where the samples sizes are comparable to the data dimension. We propose estimating each class covariance matrix as a distinct linear combination of all class sample covariance matrices. This approach is shown to reduce the estimation error when the sample sizes are limited, and the true class covariance matrices share a somewhat similar structure. We develop an effective method for estimating the coefficients in the linear combination that minimize the mean squared error under the general assumption that the samples are drawn from (unspecified) elliptically symmetric distributions possessing finite fourth-order moments. To this end, we utilize the spatial sign covariance matrix, which we show (under rather general conditions) to be an unbiased estimator of the normalized covariance matrix as the dimension grows to infinity. We also show how the proposed method can be used in choosing the regularization parameters for multiple target matrices in a single class covariance matrix estimation problem. We assess the proposed method via numerical simulation studies including an application in global minimum variance portfolio optimization using real stock data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.