Deep Net Tree Structure for Balance of Capacity and Approximation Ability

Chui, Charles K.; Lin, Shao Bo; Zhou, Ding

doi:10.3389/fams.2019.00046

Cited by 8 publications

(4 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…x i ) − y i ) 2over a compact subset H of C(X ), which can be verified with the same proof as that of[4, Theorem 2]. Lemma 11.…”

mentioning

confidence: 53%

“…All the above estimates on approximation by deep neural networks, structured or fully connected, are stated in terms of the smoothness of the approximated function. Approximating radial functions by fully-connected neural networks was studied in [20,3,4], while representing functions with variables having given compositional structures by fully-connected networks designed based on the known compositional structures was considered in [22,27].…”

Section: Generalization Analysis Of Dcnnsmentioning

confidence: 99%

“…+ x 2 d . They arise naturally in statistical physics, early warning of earthquakes, 3-D point-cloud segmentation, and image rendering, and their learning by fully connected neural networks was studied in [20,3,4].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Theory of Deep Convolutional Neural Networks III: Approximating Radial Functions

Mao

Shi

2021

Preprint

Self Cite

View full text Add to dashboard Cite

We consider a family of deep neural networks consisting of two groups of convolutional layers, a downsampling operator, and a fully connected layer. The network structure depends on two structural parameters which determine the numbers of convolutional layers and the width of the fully connected layer. We establish an approximation theory with explicit approximation rates when the approximated function takes a composite form f • Q with a feature polynomial Q and a univariate function f . In particular, we prove that such a network can outperform fully connected shallow networks in approximating radial functions with Q(x) = |x| 2 , when the dimension d of data from R d is large. This gives the first rigorous proof for the superiority of deep convolutional neural networks in approximating functions with special structures. Then we carry out generalization analysis for empirical risk minimization with such a deep network in a regression framework with the regression function of the form f • Q. Our network structure which does not use any composite information or the functions Q and f can automatically extract features and make use of the composite nature of the regression function via tuning the structural parameters. Our analysis provides an error bound which decreases with the network depth to a minimum and then increases, verifying theoretically a trade-off phenomenon observed for network depths in many practical applications.

show abstract

“…x i ) − y i ) 2over a compact subset H of C(X ), which can be verified with the same proof as that of[4, Theorem 2]. Lemma 11.…”

mentioning

confidence: 53%

Section: Generalization Analysis Of Dcnnsmentioning

confidence: 99%

See 1 more Smart Citation

Theory of Deep Convolutional Neural Networks III: Approximating Radial Functions

Mao

Shi

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…where For example, taking the special form of Toeplitz-type weight matrices leads to the deep convolutional nets [47], [48], [49], full matrices correspond to deep fully connected nets [12], and tree-type sparse matrices imply deep nets with tree structures [5], [6]. In this paper, we do not focus on the structure selection of deep nets, but rather on the existence of some deep net structure for realization of the sampling theorem established in Theorem 1.…”

Section: A Deep Relu Netsmentioning

confidence: 99%

Realization of spatial sparseness by deep ReLU nets with massive data

Chui¹,

Lin²,

Zhang³

2019

Preprint

Self Cite

View full text Add to dashboard Cite

The great success of deep learning poses urgent challenges for understanding its working mechanism and rationality. The depth, structure, and massive size of the data are recognized to be three key ingredients for deep learning. Most of the recent theoretical studies for deep learning focus on the necessity and advantages of depth and structures of neural networks. In this paper, we aim at rigorous verification of the importance of massive data in embodying the out-performance of deep learning. To approximate and learn spatially sparse and smooth functions, we establish a novel sampling theorem in learning theory to show the necessity of massive data. We then prove that implementing the classical empirical risk minimization on some deep nets facilitates in realization of the optimal learning rates derived in the sampling theorem. This perhaps explains why deep learning performs so well in the era of big data.

show abstract