High-dimensional statistical learning (HDSL) has been widely applied in data analysis, operations research, and stochastic optimization. Despite the availability of multiple theoretical frameworks, most HDSL theories stipulate the following two conditions, which are sometimes overly critical: (a) the sparsity, and (b) the restricted strong convexity (RSC). This paper generalizes both conditions via the use of the folded concave penalty (FCP); we show that, for an M-estimation problem where (i) the (conventional) sparsity is relaxed into the approximate sparsity and (ii) the RSC is completely absent, the FCP-based regularization leads to poly-logarithmic sample complexity: the size of the training data is only required to be poly-logarithmic in the problem dimensionality. This finding allows us to further understand two important paradigms much less discussed formerly: the high-dimensional nonsmooth learning and the (deep) neural networks (NN). For both problems, we show that the poly-logarithmic sample complexity can be maintained. Furthermore, via integrating the NN with the FCP, the excess risk of a stationary point to the training formulation for the NN is strictly monotonic with respect to the solution's suboptimality gap, providing the first theoretical evidence for the empirically observed consistency between the generalization performance and the optimization quality in training an NN.
In “High-Dimensional Learning Under Approximate Sparsity with Applications to Nonsmooth Estimation and Regularized Neural Networks,” Liu, Ye, and Lee study a model fitting problem where there are much fewer data than problem dimensions. Of their particular focus are the scenarios where the commonly imposed sparsity assumption is relaxed, and the usual condition of the restricted strong convexity is absent. The results show that generalization performance can still be ensured in such settings, even if the problem dimensions grow exponentially. The authors further study the sample complexities of high-dimensional nonsmooth estimation and neural networks. Particularly for the latter, it is shown that, with explicit regularization, a neural network is provably generalizable, even if the sample size is only poly-logarithmic in the number of fitting parameters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.