Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependence, as well. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently nonconvex, and it is difficult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing nonconvex programs, we are able to both analyze the statistical error associated with any global optimum, and more surprisingly, to prove that a simple algorithm based on projected gradient descent will converge in polynomial time to a small neighborhood of the set of all global minimizers. On the statistical side, we provide nonasymptotic bounds that hold with high probability for the cases of noisy, missing and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm is guaranteed to converge at a geometric rate to a near-global minimizer. We illustrate these theoretical predictions with simulations, showing close agreement with the predicted scalings. . This reprint differs from the original in pagination and typographic detail. 1 2 P.-L. LOH AND M. J. WAINWRIGHTalistic for many applications, in which covariates may be observed only partially, observed subject to corruption or exhibit some type of dependency.Consider the problem of modeling the voting behavior of politicians: in this setting, votes may be missing due to abstentions, and temporally dependent due to collusion or "tit-for-tat" behavior. Similarly, surveys often suffer from the missing data problem, since users fail to respond to all questions. Sensor network data also tends to be both noisy due to measurement error, and partially missing due to failures or drop-outs of sensors.There are a variety of methods for dealing with noisy and/or missing data, including various heuristic methods, as well as likelihood-based methods involving the expectation-maximization (EM) algorithm (e.g., see the book [8] and references therein). A challenge in this context is the possible nonconvexity of associated optimization problems. For instance, in applications of EM, problems in which the negative likelihood is a convex function often become nonconvex with missing or noisy data. Consequently, although the EM algorithm will converge to a local minimum, it is difficult to guarantee that the local optimum is close to a global minimum.In this paper, we study these issues in the context of high-dimensional sparse linear regression-in particular, in the case when the predictors or covariates are noisy, missing, and/or dependent. Our main contribution is to develop and study simple methods for handling th...
We study theoretical properties of regularized robust M -estimators, applicable when data are drawn from a sparse high-dimensional linear model and contaminated by heavytailed distributions and/or outliers in the additive errors and covariates. We first establish a form of local statistical consistency for the penalized regression estimators under fairly mild conditions on the error distribution: When the derivative of the loss function is bounded and satisfies a local restricted curvature condition, all stationary points within a constant radius of the true regression vector converge at the minimax rate enjoyed by the Lasso with sub-Gaussian errors. When an appropriate nonconvex regularizer is used in place of an ℓ 1 -penalty, we show that such stationary points are in fact unique and equal to the local oracle solution with the correct support-hence, results on asymptotic normality in the low-dimensional case carry over immediately to the high-dimensional setting. This has important implications for the efficiency of regularized nonconvex Mestimators when the errors are heavy-tailed. Our analysis of the local curvature of the loss function also has useful consequences for optimization when the robust regression function and/or regularizer is nonconvex and the objective function possesses stationary points outside the local region. We show that as long as a composite gradient descent algorithm is initialized within a constant radius of the true regression vector, successive iterates will converge at a linear rate to a stationary point within the local region. Furthermore, the global optimum of a convex regularized robust regression function may be used to obtain a suitable initialization. The result is a novel two-step procedure that uses a convex Mestimator to achieve consistency and a nonconvex M -estimator to increase efficiency. We conclude with simulation results that corroborate our theoretical findings.
We demonstrate that the primal-dual witness proof method may be used to establish variable selection consistency and ℓ ∞ -bounds for sparse regression problems, even when the loss function and/or regularizer are nonconvex. Using this method, we derive two theorems concerning support recovery and ℓ ∞ -guarantees for the regression estimator in a general setting. Our results provide rigorous theoretical justification for the use of nonconvex regularization: For certain nonconvex regularizers with vanishing derivative away from the origin, support recovery consistency may be guaranteed without requiring the typical incoherence conditions present in ℓ 1 -based methods. We then derive several corollaries that illustrate the wide applicability of our method to analyzing composite objective functions involving losses such as least squares, nonconvex modified least squares for errors-in-variables linear regression, the negative log likelihood for generalized linear models, and the graphical Lasso. We conclude with empirical studies to corroborate our theoretical predictions.
We investigate the relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a non-Gaussian distribution. The proof exploits a combination of ideas from the geometry of exponential families, junction tree theory and convex analysis. These populationlevel results have various consequences for graph selection methods, both known and novel, including a novel method for structure estimation for missing or corrupted observations. We provide nonasymptotic guarantees for such methods and illustrate the sharpness of these predictions via simulations. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Statistics, 2013, Vol. 41, No. 6, 3022-3049. This reprint differs from the original in pagination and typographic detail. 1 2 P.-L. LOH AND M. J. WAINWRIGHTpropensity to spread an infectious disease [28]. It is a classical corollary of the Hammersley-Clifford theorem [5,15,21] that zeros in the inverse covariance matrix of a multivariate Gaussian distribution indicate absent edges in the corresponding graphical model. This fact, combined with various types of statistical estimators suited to high dimensions, has been leveraged by many authors to recover the structure of a Gaussian graphical model when the edge set is sparse (see the papers [8,27,31,38] and the references therein). Recently, Liu et al. [23] and Liu, Lafferty and Wasserman [24] introduced the notion of a nonparanormal distribution, which generalizes the Gaussian distribution by allowing for monotonic univariate transformations, and argued that the same structural properties of the inverse covariance matrix carry over to the nonparanormal; see also the related work of Xue and Zou [37] on copula transformations. However, for non-Gaussian graphical models, the question of whether a general relationship exists between conditional independence and the structure of the inverse covariance matrix remains unresolved. In this paper, we establish a number of interesting links between covariance matrices and the edge structure of an underlying graph in the case of discrete-valued random variables. (Although we specialize our treatment to multinomial random variables due to their widespread applicability, several of our results have straightforward generalizations to other types of exponential families.) Instead of only analyzing the standard covariance matrix, we show that it is often fruitful to augment the usual covariance matrix with higher-order interaction terms. Our main result has an interesting corollary ...
New technologies that measure sparse molecular biomarkers from easily accessible bodily fluids (e.g. blood, urine, and saliva) are revolutionizing disease diagnostics and precision medicine. Microchip devices can measure more disease biomarkers with better sensitivity and specificity each year, but clinical interpretation of these biomarkers remains a challenge. Single biomarkers in 'liquid biopsy' often cannot accurately predict the state of a disease due to heterogeneity in phenotype and disease expression across individuals. To address this challenge, investigators are combining multiplexed measurements of different biomarkers that together define robust signatures for specific disease states. Machine learning is a useful tool to automatically discover and detect these signatures, especially as new technologies output increasing quantities of molecular data. In this paper, we review the state of the field of machine learning applied to molecular diagnostics and provide practical guidance to use this tool effectively and to avoid common pitfalls.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.