On the Depth of Deep Neural Networks: A Theoretical View

Sun, Shizhao; Chen, Wei; Wang, Liwei; Liu, Xiaoguang; Liu, Tie-Yan

doi:10.1609/aaai.v30i1.10243

Cited by 83 publications

(15 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Why do deeper networks perform worse in this task than the VGG13? In a recent theoretical study (Sun et al, 2016 ) of the depth of neural networks, it is shown that as the network deepens, the Rademacher Average (RA; a measurement of complexity) increases accordingly, which has some negative effects on the network. From the information acquisition point of view, deeper networks may be overfitting in learning and learning something unimportant, so the deeper networks do not seem to be able to perform simple tasks on par with computationally less sophisticated networks.…”

Section: Discussionmentioning

confidence: 99%

Disrupted visual input unveils the computational details of artificial neural networks for face perception

Ying

2022

Front. Comput. Neurosci.

View full text Add to dashboard Cite

BackgroundConvolutional Neural Network (DCNN), with its great performance, has attracted attention of researchers from many disciplines. The studies of the DCNN and that of biological neural systems have inspired each other reciprocally. The brain-inspired neural networks not only achieve great performance but also serve as a computational model of biological neural systems.MethodsHere in this study, we trained and tested several typical DCNNs (AlexNet, VGG11, VGG13, VGG16, DenseNet, MobileNet, and EfficientNet) with a face ethnicity categorization task for experiment 1, and an emotion categorization task for experiment 2. We measured the performance of DCNNs by testing them with original and lossy visual inputs (various kinds of image occlusion) and compared their performance with human participants. Moreover, the class activation map (CAM) method allowed us to visualize the foci of the “attention” of these DCNNs.ResultsThe results suggested that the VGG13 performed the best: Its performance closely resembled human participants in terms of psychophysics measurements, it utilized similar areas of visual inputs as humans, and it had the most consistent performance with inputs having various kinds of impairments.DiscussionIn general, we examined the processing mechanism of DCNNs using a new paradigm and found that VGG13 might be the most human-like DCNN in this task. This study also highlighted a possible paradigm to study and develop DCNNs using human perception as a benchmark.

show abstract

Section: Discussionmentioning

confidence: 99%

Disrupted visual input unveils the computational details of artificial neural networks for face perception

Ying

2022

Front. Comput. Neurosci.

View full text Add to dashboard Cite

show abstract

“…For instance, the deep network shown in Figure 1 b involves complex computations with many parameters and intermediate data with high latency and energy consumption, which is not appropriate for low-cost resource-constrained devices. There is a tradeoff between these performances, however; deeper networks or increasing the depth of networks is not always good [ 40 ]. Inspired by this, we have conducted model reduction for the model to be lightweight by reducing the depth of the model and involving the intermediate maxpool functions in convolutions with less computation complexity and low latency.…”

Section: Proposed Model Optimizationmentioning

confidence: 99%

Lightweight and Energy-Efficient Deep Learning Accelerator for Real-Time Object Detection on Edge Devices

Jang

Park

Lee

et al. 2023

Sensors

View full text Add to dashboard Cite

Tiny machine learning (TinyML) has become an emerging field according to the rapid growth in the area of the internet of things (IoT). However, most deep learning algorithms are too complex, require a lot of memory to store data, and consume an enormous amount of energy for calculation/data movement; therefore, the algorithms are not suitable for IoT devices such as various sensors and imaging systems. Furthermore, typical hardware accelerators cannot be embedded in these resource-constrained edge devices, and they are difficult to drive real-time inference processing as well. To perform the real-time processing on these battery-operated devices, deep learning models should be compact and hardware-optimized, and hardware accelerator designs also have to be lightweight and consume extremely low energy. Therefore, we present an optimized network model through model simplification and compression for the hardware to be implemented, and propose a hardware architecture for a lightweight and energy-efficient deep learning accelerator. The experimental results demonstrate that our optimized model successfully performs object detection, and the proposed hardware design achieves 1.25× and 4.27× smaller logic and BRAM size, respectively, and its energy consumption is approximately 10.37× lower than previous similar works with 43.95 fps as a real-time process under an operating frequency of 100 MHz on a Xilinx ZC702 FPGA.

show abstract

“…For neural networks, however, the hypothesis space is large and combinatorially explodes in size with the neural network width and depth, making the corresponding bounds loose (cf. Bartlett et al, 1998;Harvey et al, 2017;Bartlett et al, 2019;Sun et al, 2016). Uniform bounds that utilize the parametric characterization of the network rapidly with the size of the neural network (e.g., Neyshabur et al, 2015).…”

Section: Related Workmentioning

confidence: 99%

Instance-Dependent Generalization Bounds via Optimal Transport

Hou¹,

Parnian²,

Kratsios³

et al. 2022

Preprint

View full text Add to dashboard Cite

Existing generalization bounds fail to explain crucial factors that drive generalization of modern neural networks. Since such bounds often hold uniformly over all parameters, they suffer from over-parametrization, and fail to account for the strong inductive bias of initialization and stochastic gradient descent. As an alternative, we propose a novel optimal transport interpretation of the generalization problem. This allows us to derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space. Therefore, our bounds are agnostic to the parametrization of the model and work well when the number of training samples is much smaller than the number of parameters. With small modifications, our approach yields accelerated rates for data on low-dimensional manifolds, and guarantees under distribution shifts. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.

show abstract

On the Depth of Deep Neural Networks: A Theoretical View

Cited by 83 publications

References 27 publications

Disrupted visual input unveils the computational details of artificial neural networks for face perception

Disrupted visual input unveils the computational details of artificial neural networks for face perception

Lightweight and Energy-Efficient Deep Learning Accelerator for Real-Time Object Detection on Edge Devices

Instance-Dependent Generalization Bounds via Optimal Transport

Contact Info

Product

Resources

About