2020
DOI: 10.1109/tcsvt.2019.2903421
|View full text |Cite
|
Sign up to set email alerts
|

CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams

Abstract: The last few years have brought advances in computer vision at an amazing pace, grounded on new findings in deep neural network construction and training as well as the availability of large labeled datasets. Applying these networks to images demands a high computational effort and pushes the use of state-of-the-art networks on real-time video data out of reach of embedded platforms.Many recent works focus on reducing network complexity for real-time inference on embedded computing platforms. We adopt an ortho… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 30 publications
(31 citation statements)
references
References 47 publications
0
31
0
Order By: Relevance
“…In a different line of work to ours, CBinfer [14] uses a software-level solution to increase unchanged pixels between consecutive frames by comparing and filtering the difference between pixels with a threshold value. To achieve considerable computation reduction, threshold values need to be chosen as large numbers, which subsequently leads to a significant accuracy loss in recognition.…”
Section: A Computation Reductionmentioning
confidence: 99%
See 2 more Smart Citations
“…In a different line of work to ours, CBinfer [14] uses a software-level solution to increase unchanged pixels between consecutive frames by comparing and filtering the difference between pixels with a threshold value. To achieve considerable computation reduction, threshold values need to be chosen as large numbers, which subsequently leads to a significant accuracy loss in recognition.…”
Section: A Computation Reductionmentioning
confidence: 99%
“…Accuracy: Figure 9 compares the mAP of SQS implementation of Yolo-V3 [7] against 32-bit floating-point precision (FP32), conventional quantization approach [28] (INT8), CBinfer [14] , and DeepCach [15] implementations. The SQS is repeated for γ = 0.1, γ = 0.3, and γ = 0.5, using both symmetric, SQS(sym), and asymmetric, SQS(asym), quantization.…”
Section: B Accuracy and Computation Complexitymentioning
confidence: 99%
See 1 more Smart Citation
“…Overall, the feature maps later in the network are more sparse, and generally this is correlated with the number of feature maps (also in AlexNet). Feature maps following expanding 1×1 convolutions (e.g., 15,17,19,21) generally show lower sparsity (25-40%) than after the depthwise separable 3×3 convolutions (e.g., 16,18,20,22; sparsity 50-65%), where for the latter there are exceptions (e.g., 8,14,28) when these convolutions were strided (sparsity 20-35%). This aligns with intuition as the 1×1 layers combine feature maps to be filtered later, and the depth-wise 3×3 convolution layers literally perform the filtering.…”
Section: B Sparsity Activation Histogram and Data Layoutmentioning
confidence: 99%
“…Thus, bringing intelligence to the edge is creating fascinating challenges for industrial and academic researchers [6], [8]. Lots of research efforts towards specialized hardware and optimized inference algorithms to run such NNs on power-constrained devices have been made over the last few years [15]- [17]. Today's IoT devices host microcontrollers, especially from the ARM Cortex-M family, which are able to achieve power consumption in the order of mW and computational resources in the order of hundreds of MOPS [1], [18], [19].…”
Section: Introductionmentioning
confidence: 99%