CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams

Cavigelli, Lukas; Benini, Luca

doi:10.1109/tcsvt.2019.2903421

Cited by 30 publications

(31 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In a different line of work to ours, CBinfer [14] uses a software-level solution to increase unchanged pixels between consecutive frames by comparing and filtering the difference between pixels with a threshold value. To achieve considerable computation reduction, threshold values need to be chosen as large numbers, which subsequently leads to a significant accuracy loss in recognition.…”

Section: A Computation Reductionmentioning

confidence: 99%

“…Accuracy: Figure 9 compares the mAP of SQS implementation of Yolo-V3 [7] against 32-bit floating-point precision (FP32), conventional quantization approach [28] (INT8), CBinfer [14] , and DeepCach [15] implementations. The SQS is repeated for γ = 0.1, γ = 0.3, and γ = 0.5, using both symmetric, SQS(sym), and asymmetric, SQS(asym), quantization.…”

Section: B Accuracy and Computation Complexitymentioning

confidence: 99%

“…Computational Complexity Reduction: Previous works such as CBinfer [14], DeepCach [15], and SigmaDelta [13] aim to identify unchanged pixels and skip computations at the software level. Therefore, they use general-purpose GPU/CPU platforms to execute their approach.…”

Section: B Accuracy and Computation Complexitymentioning

confidence: 99%

See 2 more Smart Citations

Similarity-Aware CNN for Efficient Video Recognition at the Edge

Sabet

Hare

Al-Hashimi

et al. 2022

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) often extract similar features from successive video frames due to having identical appearances. In contrast, conventional CNNs for video recognition process individual frames with a fixed computational effort. Each video frame is independently processed, resulting in numerous redundant computations and an inefficient use of limited energy resources, particularly for edge computing applications. To alleviate the high energy requirements associated with video frame processing, this paper presented similarityaware CNNs that recognise similar feature pixels across frames and avoid computations on them. First, with a loss of less than 1% in recognition accuracy, a proposed similarity aware quantization technique increases the average number of unchanged feature pixels across frame pairs by up to 85%. Then, a proposed similarity-aware dataflow improves energy consumption by minimising redundant computations and memory accesses across frame pairs. According to simulation experiments, the proposed dataflow decreases the energy consumed by video frame processing by up to 30%.

show abstract

Section: A Computation Reductionmentioning

confidence: 99%

Section: B Accuracy and Computation Complexitymentioning

confidence: 99%

See 1 more Smart Citation

Similarity-Aware CNN for Efficient Video Recognition at the Edge

Sabet

Hare

Al-Hashimi

et al. 2022

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

show abstract

“…Overall, the feature maps later in the network are more sparse, and generally this is correlated with the number of feature maps (also in AlexNet). Feature maps following expanding 1×1 convolutions (e.g., 15,17,19,21) generally show lower sparsity (25-40%) than after the depthwise separable 3×3 convolutions (e.g., 16,18,20,22; sparsity 50-65%), where for the latter there are exceptions (e.g., 8,14,28) when these convolutions were strided (sparsity 20-35%). This aligns with intuition as the 1×1 layers combine feature maps to be filtered later, and the depth-wise 3×3 convolution layers literally perform the filtering.…”

Section: B Sparsity Activation Histogram and Data Layoutmentioning

confidence: 99%

EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators

Cavigelli

Rutishauser

Benini

2019

IEEE J. Emerg. Sel. Topics Circuits Syst.

Self Cite

View full text Add to dashboard Cite

In the wake of the success of convolutional neural networks in image classification, object recognition, speech recognition, etc., the demand for deploying these compute-intensive ML models on embedded and mobile systems with tight power and energy constraints at low cost, as well as for boosting throughput in data centers, is growing rapidly. This has sparked a surge of research into specialized hardware accelerators. Their performance is typically limited by I/O bandwidth, power consumption is dominated by I/O transfers to off-chip memory, and on-chip memories occupy a large part of the silicon area.We introduce and evaluate a novel, hardware-friendly, and lossless compression scheme for the feature maps present within convolutional neural networks. We present hardware architectures and synthesis results for the compressor and decompressor in 65 nm. With a throughput of one 8-bit word/cycle at 600 MHz, they fit into 2.8 kGE and 3.0 kGE of silicon area, respectivelytogether the size of less than seven 8-bit multiply-add units at the same throughput.We show that an average compression ratio of 5.1× for AlexNet, 4× for VGG-16, 2.4× for ResNet-34 and 2.2× for MobileNetV2 can be achieved-a gain of 45-70% over existing methods. Our approach also works effectively for various number formats, has a low frame-to-frame variance on the compression ratio, and achieves compression factors for gradient map compression during training that are even better than for inference.

show abstract

“…Thus, bringing intelligence to the edge is creating fascinating challenges for industrial and academic researchers [6], [8]. Lots of research efforts towards specialized hardware and optimized inference algorithms to run such NNs on power-constrained devices have been made over the last few years [15]- [17]. Today's IoT devices host microcontrollers, especially from the ARM Cortex-M family, which are able to achieve power consumption in the order of mW and computational resources in the order of hundreds of MOPS [1], [18], [19].…”

Section: Introductionmentioning

confidence: 99%

FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things

Wang

Magno

Cavigelli

et al. 2020

IEEE Internet Things J.

Self Cite

130

View full text Add to dashboard Cite

The growing number of low-power smart devices in the Internet of Things is coupled with the concept of "Edge Computing", that is moving some of the intelligence, especially machine learning, towards the edge of the network. Enabling machine learning algorithms to run on resource-constrained hardware, typically on low-power smart devices, is challenging in terms of hardware (optimized and energy-efficient integrated circuits), algorithmic and firmware implementations. This paper presents FANN-on-MCU, an open-source toolkit built upon the Fast Artificial Neural Network (FANN) library to run lightweight and energy-efficient neural networks on microcontrollers based on both the ARM Cortex-M series and the novel RISC-Vbased Parallel Ultra-Low-Power (PULP) platform. The toolkit takes multi-layer perceptrons trained with FANN and generates code targeted to low-power microcontrollers. This paper also presents detailed analyses of energy efficiency across the different cores, and the optimizations to handle different network sizes. Moreover, it provides a detailed analysis of parallel speedups and degradations due to parallelization overhead and memory transfers. Further evaluations include experimental results for three different applications using a self-sustainable wearable multi-sensor bracelet. Experimental results show a measured latency in the order of only a few microseconds and power consumption of a few milliwatts while keeping the memory requirements below the limitations of the targeted microcontrollers. In particular, the parallel implementation on the octa-core RISC-V platform reaches a speedup of 22x and a 69% reduction in energy consumption with respect to a single-core implementation on Cortex-M4 for continuous real-time classification.

show abstract

CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams

Cited by 30 publications

References 47 publications

Similarity-Aware CNN for Efficient Video Recognition at the Edge

Similarity-Aware CNN for Efficient Video Recognition at the Edge

EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators

FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things

Contact Info

Product

Resources

About