Spectral algorithms have been widely used and studied in learning theory and inverse problems. This paper is concerned with distributed spectral algorithms, for handling big data, based on a divide-and-conquer approach. We present a learning theory for these distributed kernel-based learning algorithms in a regression framework including nice error bounds and optimal minimax learning rates achieved by means of a novel integral operator approach and a second order decomposition of inverse operators. Our quantitative estimates are given in terms of regularity of the regression function, effective dimension of the reproducing kernel Hilbert space, and qualification of the filter function of the spectral algorithm. They do not need any eigenfunction or noise conditions and are better than the existing results even for the classical family of spectral algorithms.
An extreme learning machine (ELM) is a feedforward neural network (FNN) like learning system whose connections with output neurons are adjustable, while the connections with and within hidden neurons are randomly fixed. Numerous applications have demonstrated the feasibility and high efficiency of ELM-like systems. It has, however, been open if this is true for any general applications. In this two-part paper, we conduct a comprehensive feasibility analysis of ELM. In Part I, we provide an answer to the question by theoretically justifying the following: 1) for some suitable activation functions, such as polynomials, Nadaraya-Watson and sigmoid functions, the ELM-like systems can attain the theoretical generalization bound of the FNNs with all connections adjusted, i.e., they do not degrade the generalization capability of the FNNs even when the connections with and within hidden neurons are randomly fixed; 2) the number of hidden neurons needed for an ELM-like system to achieve the theoretical bound can be estimated; and 3) whenever the activation function is taken as polynomial, the deduced hidden layer output matrix is of full column-rank, therefore the generalized inverse technique can be efficiently applied to yield the solution of an ELM-like system, and, furthermore, for the nonpolynomial case, the Tikhonov regularization can be applied to guarantee the weak regularity while not sacrificing the generalization capability. In Part II, however, we reveal a different aspect of the feasibility of ELM: there also exists some activation functions, which makes the corresponding ELM degrade the generalization capability. The obtained results underlie the feasibility and efficiency of ELM-like systems, and yield various generalizations and improvements of the systems as well.
Along with the rapid development of deep learning in practice, theoretical explanations for its success become urgent. Generalization and expressivity are two widely used measurements to quantify theoretical behaviors of deep learning. The expressivity focuses on finding functions expressible by deep nets but cannot be approximated by shallow nets with the similar number of neurons. It usually implies the large capacity. The generalization aims at deriving fast learning rate for deep nets. It usually requires small capacity to reduce the variance. Different from previous studies on deep learning, pursuing either expressivity or generalization, we take both factors into account to explore theoretical advantages of deep nets. For this purpose, we construct a deep net with two hidden layers possessing excellent expressivity in terms of localized and sparse approximation. Then, utilizing the well known covering number to measure the capacity, we find that deep nets possess excellent expressive power (measured by localized and sparse approximation) without essentially enlarging the capacity of shallow nets. As a consequence, we derive near optimal learning rates for implementing empirical risk minimization (ERM) on the constructed deep nets. These results theoretically exhibit the advantage of deep nets from learning theory viewpoints.
In recent studies on sparse modeling, non-convex penalties have received considerable attentions due to their superiorities on sparsity-inducing over the convex counterparts. Compared with the convex optimization approaches, however, the non-convex approaches have more challenging convergence analysis. In this paper, we study the convergence of a non-convex iterative thresholding algorithm for solving sparse recovery problems with a certain class of non-convex penalties, whose corresponding thresholding functions are discontinuous with jump discontinuities. Therefore, we call the algorithm the iterative jumping thresholding (IJT) algorithm. The finite support and sign convergence of IJT algorithm is firstly verified via taking advantage of such jump discontinuity. Together with the assumption of the introduced restricted Kurdyka-Łojasiewicz (rKL) property, then the strong convergence of IJT algorithm can be proved. Furthermore, we can show that IJT algorithm converges to a local minimizer at an asymptotically linear rate under some additional conditions. Moreover, we derive a posteriori computable error estimate, which can be used to design practical terminal rules for the algorithm. It should be pointed out that the lq quasi-norm (0 < q < 1) is an important subclass of the class of non-convex penalties studied in this paper. In particular, when applied to the lq regularization, IJT algorithm can converge to a local minimizer with an asymptotically linear rate under certain concentration conditions. We provide also a set of simulations to support the correctness of theoretical assertions and compare the time efficiency of IJT algorithm for the lq regularization (q = 1/2, 2/3) with other known typical algorithms like the iterative reweighted least squares (IRLS) algorithm and the iterative reweighted l1 minimization (IRL1) algorithm.Index Terms-Sparse regularization, non-convex optimization, iterative thresholding algorithm, lq regularization (0 < q < 1), Kurdyka-Łojasiewicz inequality
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.