Deep convolutional neural networks (DCNNs) have achieved great empirical success in many fields such as natural language processing, computer vision, and pattern recognition. But there still lacks theoretical understanding of the flexibility and adaptivity of DCNNs in various learning tasks, and the power of DCNNs at feature extraction. We propose a generic DCNN structure consisting of two groups of convolutional layers associated with two downsampling operators, and a fully connected layer, which is determined only by three structural parameters. Our generic DCNNs are capable of extracting various features including not only polynomial features but also general smooth features. We also show that the curse of dimensionality can be circumvented by our DCNNs for target functions of the compositional form with (symmetric) polynomial features, spatially sparse smooth features, and interaction features. These demonstrate the expressive power of our DCNN structure, while the model selection can be relaxed comparing with other deep neural networks since there are only three hyperparameters controlling the architecture to tune.
Regularization schemes for regression have been widely studied in learning theory and inverse problems. In this paper, we study regularized distribution regression (DR) which involves two stages of sampling, and aims at regressing from probability measures to real-valued responses by regularization over a reproducing kernel Hilbert space. Many important tasks in statistical learning and inverse problems can be treated in this framework. Examples include multi-instance learning and point estimation for problems without analytical solutions. Recently, theoretical analysis on DR has been carried out via kernel ridge regression and several interesting learning behaviors have been observed. However, the topic has not been explored and understood beyond the least squares based DR. By introducing a robust loss function l σ for two-stage sampling problems, we present a novel robust distribution regression (RDR) scheme. With a windowing function V and a scaling parameter σ which can be appropriately chosen, l σ can include a wide range of commonly used loss functions that enrich the theme of DR. Moreover, the loss l σ is not necessarily convex, which enlarges the regression class (least squares) in the literature of DR. Learning rates in different regularity ranges of the regression function are comprehensively studied and derived via integral operator techniques. The scaling parameter σ is shown to be crucial in providing robustness and satisfactory learning rates of RDR.
We consider a family of deep neural networks consisting of two groups of convolutional layers, a downsampling operator, and a fully connected layer. The network structure depends on two structural parameters which determine the numbers of convolutional layers and the width of the fully connected layer. We establish an approximation theory with explicit approximation rates when the approximated function takes a composite form f • Q with a feature polynomial Q and a univariate function f . In particular, we prove that such a network can outperform fully connected shallow networks in approximating radial functions with Q(x) = |x| 2 , when the dimension d of data from R d is large. This gives the first rigorous proof for the superiority of deep convolutional neural networks in approximating functions with special structures. Then we carry out generalization analysis for empirical risk minimization with such a deep network in a regression framework with the regression function of the form f • Q. Our network structure which does not use any composite information or the functions Q and f can automatically extract features and make use of the composite nature of the regression function via tuning the structural parameters. Our analysis provides an error bound which decreases with the network depth to a minimum and then increases, verifying theoretically a trade-off phenomenon observed for network depths in many practical applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.