High-dimensional statistical learning (HDSL) has been widely applied in data analysis, operations research, and stochastic optimization. Despite the availability of multiple theoretical frameworks, most HDSL theories stipulate the following two conditions, which are sometimes overly critical: (a) the sparsity, and (b) the restricted strong convexity (RSC). This paper generalizes both conditions via the use of the folded concave penalty (FCP); we show that, for an M-estimation problem where (i) the (conventional) sparsity is relaxed into the approximate sparsity and (ii) the RSC is completely absent, the FCP-based regularization leads to poly-logarithmic sample complexity: the size of the training data is only required to be poly-logarithmic in the problem dimensionality. This finding allows us to further understand two important paradigms much less discussed formerly: the high-dimensional nonsmooth learning and the (deep) neural networks (NN). For both problems, we show that the poly-logarithmic sample complexity can be maintained. Furthermore, via integrating the NN with the FCP, the excess risk of a stationary point to the training formulation for the NN is strictly monotonic with respect to the solution's suboptimality gap, providing the first theoretical evidence for the empirically observed consistency between the generalization performance and the optimization quality in training an NN.