Deep learning has been widely applied and brought breakthroughs in speech recognition, computer vision, and many other domains. Deep neural network architectures and computational issues have been well studied in machine learning. But there lacks a theoretical foundation for understanding the approximation or generalization ability of deep learning methods generated by the network architectures such as deep convolutional neural networks. Here we show that a deep convolutional neural network (CNN) is universal, meaning that it can be used to approximate any continuous function to an arbitrary accuracy when the depth of the neural network is large enough. This answers an open question in learning theory. Our quantitative estimate, given tightly in terms of the number of free parameters to be computed, verifies the efficiency of deep CNNs in dealing with large dimensional data. Our study also demonstrates the role of convolutions in deep CNNs.
The covering number of a ball of a reproducing kernel Hilbert space as a subset of the continuous function space plays an important role in Learning Theory. We give estimates for this covering number by means of the regularity of the Mercer kernel K: For convolution type kernels Kðx; tÞ ¼ kðx À tÞ on ½0; 1 n ; we provide estimates depending on the decay of # k k; the Fourier transform of k: In particular, when # k k decays exponentially, our estimate for this covering number is better than all the previous results and covers many important Mercer kernels. A counter example is presented to show that the eigenfunctions of the Hilbert-Schmidt operator L K associated with a Mercer kernel K may not be uniformly bounded. Hence some previous methods used for estimating the covering number in Learning Theory are not valid. We also provide an example of a Mercer kernel to show that L 1=2 K may not be generated by a Mercer kernel. # 2002 Elsevier Science (USA)
PreambleI first met René at the well-known 1956 meeting on topology in Mexico City. He then came to the University of Chicago, where I was starting my job as instructor for the fall of 1956. He, Suzanne, Clara and I became good friends and saw much of each other for many decades, especially at IHES in Paris.Thom's encouragement and support were important for me, especially in my first years after my Ph.D. I studied his work in cobordism, singularities of maps, and transversality, gaining many insights. I also enjoyed listening to his provocations, for example his disparaging remarks on complex analysis, 19th century mathematics, and Bourbaki. There was also a stormy side in our relationship. Neither of us could hide the pain that our public conflicts over "catastrophe theory" caused.René Thom was a great mathematician, leaving his impact on a wide part of mathematics. I will always treasure my memories of him.
We continue our study [12] of Shannon sampling and function reconstruction. In this paper, the error analysis is improved. The problem of function reconstruction is extended to a more general setting with frames beyond point evaluation. Then we show how our approach can be applied to learning theory: a functional analysis framework is presented; sharp, dimension independent probability estimates are given not only for error in the L 2 spaces, but also for the error in the reproducing kernel Hilbert space where the a learning algorithm is performed. Covering number arguments are replaced by estimates of integral operators.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.