Retrain or Not Retrain? - Efficient Pruning Methods of Deep CNN Networks

Pietroń, Marcin; Wielgosz, Maciej

doi:10.1007/978-3-030-50420-5_34

Cited by 16 publications

(20 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, with sufficient pruning, the networks will eventually suffer large declines in performance. To mitigate this, the networks can be retrained, such as after pruning or over the course of progressive pruning (Mittal et al, 2019;Marcin and Maciej, 2020). It has been shown that this retraining allows the removal of a substantially larger number of connections while retaining comparable performance.…”

Section: Discussionmentioning

confidence: 99%

“…However, it remains to be seen whether this better recapitulates the slower manifestation of cognitive deficits seen clinically (Fox et al, 1999;Zarei et al, 2013). In contrast to random weight ablation, ablation of connections based on their strengths such as in network pruning (Han et al, 2015;Mittal et al, 2019;Marcin and Maciej, 2020), or specifically targeting excitatory (positive) or inhibitory (negative) connections (Song et al, 2016;Mackwood et al, 2021) may be instructive.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Modeling Neurodegeneration in silico With Deep Learning

Tuladhar

Moore

Ismail

et al. 2021

Front. Neuroinform.

View full text Add to dashboard Cite

Deep neural networks, inspired by information processing in the brain, can achieve human-like performance for various tasks. However, research efforts to use these networks as models of the brain have primarily focused on modeling healthy brain function so far. In this work, we propose a paradigm for modeling neural diseases in silico with deep learning and demonstrate its use in modeling posterior cortical atrophy (PCA), an atypical form of Alzheimer’s disease affecting the visual cortex. We simulated PCA in deep convolutional neural networks (DCNNs) trained for visual object recognition by randomly injuring connections between artificial neurons. Results showed that injured networks progressively lost their object recognition capability. Simulated PCA impacted learned representations hierarchically, as networks lost object-level representations before category-level representations. Incorporating this paradigm in computational neuroscience will be essential for developing in silico models of the brain and neurological diseases. The paradigm can be expanded to incorporate elements of neural plasticity and to other cognitive domains such as motor control, auditory cognition, language processing, and decision making.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Modeling Neurodegeneration in silico With Deep Learning

Tuladhar

Moore

Ismail

et al. 2021

Front. Neuroinform.

View full text Add to dashboard Cite

show abstract

“…There are many different approaches to efficiently compress neural network models without too much accuracy degradation [14,15,41,42]. Typically, pruned network accuracy suffers a sharp accuracy degradation unless the pruned network structure is re-trained with the training data [14,41,43]. Quantization suffers less from this problem, and there are many different approaches, including simply quantizing weights after training ( post-training quantization), re-training after quantization, and more [44][45][46].…”

Section: Neural Network Compressionmentioning

confidence: 99%

“…Values can be quantized from the typically 4-byte floating-point values to 8-bit integers [47], or even more aggressively to ternary [48,49] or binary [50,51] values with varying accuracy loss trade-offs. Quantization is typically more readily available on embedded neural networks compared to pruning since pruning can introduce sparsity in the weights, resulting in complex random access [43,52].…”

Section: Neural Network Compressionmentioning

confidence: 99%

MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators

Lim

Jun

2022

Electronics

View full text Add to dashboard Cite

Although neural network quantization is an imperative technology for the computation and memory efficiency of embedded neural network accelerators, simple post-training quantization incurs unacceptable levels of accuracy degradation on some important models targeting embedded systems, such as MobileNets. While explicit quantization-aware training or re-training after quantization can often reclaim lost accuracy, this is not always possible or convenient. We present an alternative approach to compressing such difficult neural networks, using a novel variant of the ZFP lossy floating-point compression algorithm to compress both model weights and inter-layer activations and demonstrate that it can be efficiently implemented on an embedded FPGA platform. Our ZFP variant, which we call ZFPe, is designed for efficient implementation on embedded accelerators, such as FPGAs, requiring a fraction of chip resources per bandwidth compared to state-of-the-art lossy compression accelerators. ZFPe-compressing the MobileNet V2 model with an 8-bit budget per weight and activation results in significantly higher accuracy compared to 8-bit integer post-training quantization and shows no loss of accuracy, compared to an uncompressed model when given a 12-bit budget per floating-point value. To demonstrate the benefits of our approach, we implement an embedded neural network accelerator on a realistic embedded acceleration platform equipped with the low-power Lattice ECP5-85F FPGA and a 32 MB SDRAM chip. Each ZFPe module consumes less than 6% of LUTs while compressing or decompressing one value per cycle, requiring a fraction of the resources compared to state-of-the-art compression accelerators while completely removing the memory bottleneck of our accelerator.

show abstract

“…In quantization, the neural network weights and/or the feature maps are expressed by using shorter data types, such as FP16, INT16, or INT8 instead of FP32 [6]; this leads to a lower memory footprint as well as to a lower latency as the computation cost is reduced and the SIMD instructions can be used to calculate more operations per instruction. In weight pruning [7], neurons with small saliency (sensitivity) are removed, resulting in a sparse computational graph [8]; neurons with small saliency are those whose removal minimally affects the model output/loss function.…”

Section: Related Workmentioning

confidence: 99%

Anatomy of Deep Learning Image Classification and Object Detection on Commercial Edge Devices: A Case Study on Face Mask Detection

et al. 2022

View full text Add to dashboard Cite

Developing efficient on-the-edge Deep Learning (DL) applications is a challenging and non-trivial task, as first different DL models need to be explored with different trade-offs between accuracy and complexity, second, various optimization options, frameworks and libraries are available that need to be explored, third, a wide range of edge devices are available with different computation and memory constraints. As such, trade-offs arise among inference time, energy consumption, efficiency (throughput/watt) and value (throughput/dollar). To shed some light in this problem, a case study is delivered where seven Image Classification (IC) and six Object Detection (OD) State-of-The-Art (SOTA) DL models were used to detect face masks on the following commercial off-the-shelf edge devices: Raspberry PI 4, Intel Neural Compute Stick 2, Jetson Nano, Jetson Xavier NX, and i.MX 8M Plus. First, a full end-toend video pipeline face mask wearing detection architecture is developed. Then, the thirteen DL models were optimized, evaluated and compared on the edge devices, in terms of accuracy and inference time. To leverage the computational power of the edge devices, the models have been optimized, first, by using the SOTA optimization frameworks (TensorFlow Lite, OpenVINO, TensorRT, eIQ) and, second, by evaluating/comparing different optimization options, e.g., different levels of quantization. Note that the five edge devices are evaluated and compared too, in terms of inference time, value and efficiency. Last, we obtain insightful observations on which optimization frameworks, libraries and options to use and on how to select the right device depending on the target metric (inference time, efficiency and value). For example, we show that Jetson Xavier NX platform is the best in terms of latency and efficiency (FPS/Watt), while Jetson Nano is the best in terms of value (FPS/$).

show abstract

Retrain or Not Retrain? - Efficient Pruning Methods of Deep CNN Networks

Cited by 16 publications

References 6 publications

Modeling Neurodegeneration in silico With Deep Learning

Modeling Neurodegeneration in silico With Deep Learning

MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators

Anatomy of Deep Learning Image Classification and Object Detection on Commercial Edge Devices: A Case Study on Face Mask Detection

Contact Info

Product

Resources

About