Low-Power Computer Vision: Status, Challenges, and Opportunities

Alyamkin, Sergei; Ardi, Matthew; Berg, Alexander C.; Brighton, Achille; Chen, Bin; Chen, Yiran; Cheng, Hsin-Pai; Fan, Zichen; Feng, Chen; Fu, Bo; Gauen, Kent; Goel, Abhinav; Goncharenko, Alexander; Guo, Xuyang; Ha, Soonhoi; Howard, Andrew; Hu, Xiao; Huang, Yue-Xin; Kang, Dong Heon; Kim, Jaeyoun; Ko, Jong Gook; Kondratyev, A. I.; Lee, Junhyeok; Lee, Seungjae; Lee, Suwoong; Li, Zichao; Liang, Zhiyu; Liu, Juzheng; Liu, Xin; Lü, Yang; Lu, Yung-Hsiang; Malik, Deeptanshu; Nguyen, Hong Hanh; Park, Eunbyung; Repin, Denis; Shen, Liang; Sheng, Tao; Sun, Fei; Svitov, David; Thiruvathukal, George K.; Zhang, Baiwu; Zhang, Jingchi; Zhang, Xiaopeng; Zhuo, Shaojie

doi:10.1109/jetcas.2019.2911899

Cited by 66 publications

(31 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The output of the vcgencmd get_throttled command is used to identify if the Raspberry Pi is throttled. On embedded systems, most image classification applications perform inference on one image at a time [2,11,23,24,82,83]. Even when processing videos, inference can be performed on individual frames each time [84].…”

Section: Datasets Usedmentioning

confidence: 99%

Modular Neural Networks for Low-Power Image Classification on Embedded Devices

Goel

Aghajanzadeh

Tung

et al. 2020

ACM Trans. Des. Autom. Electron. Syst.

Self Cite

View full text Add to dashboard Cite

Embedded devices are generally small, battery-powered computers with limited hardware resources. It is difficult to run deep neural networks (DNNs) on these devices, because DNNs perform millions of operations and consume significant amounts of energy. Prior research has shown that a considerable number of a DNN's memory accesses and computation are redundant when performing tasks like image classification. To reduce this redundancy and thereby reduce the energy consumption of DNNs, we introduce the Modular Neural Network Tree architecture. Instead of using one large DNN for the classifier, this architecture uses multiple smaller DNNs (called modules) to progressively classify images into groups of categories based on a novel visual similarity metric. Once a group of categories is selected by a module, another module then continues to distinguish among the similar categories within the selected group. This process is repeated over multiple modules until we are left with a single category. The computation needed to distinguish dissimilar groups is avoided, thus reducing redundant operations, memory accesses, and energy. Experimental results using several image datasets reveal the effectiveness of our proposed solution to reduce memory requirements by 50% to 99%, inference time by 55% to 95%, energy consumption by 52% to 94%, and the number of operations by 15% to 99% when compared with existing DNN architectures, running on two different embedded systems: Raspberry Pi 3 and Raspberry Pi Zero. CCS Concepts: • Computing methodologies → Neural networks; Computer vision; • Computer systems organization → Embedded systems; • Hardware → Power and energy;

show abstract

Section: Datasets Usedmentioning

confidence: 99%

Modular Neural Networks for Low-Power Image Classification on Embedded Devices

Goel

Aghajanzadeh

Tung

et al. 2020

ACM Trans. Des. Autom. Electron. Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Low-Power Computer Vision: Goel et al [7] survey lowpower DNNs and describe the benefits of reducing memory and operations for low-power applications. DNN quantization reduces the memory requirement [19] and DNN pruning reduces the DNN operations [20]. Although these techniques increase the efficiency of existing large DNNs, they generally lower the accuracy as well.…”

Section: B Related Workmentioning

confidence: 99%

Low-power object counting with hierarchical neural networks

Goel

Tung

Aghajanzadeh

et al. 2020

Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design

Self Cite

View full text Add to dashboard Cite

Low-power computer vision on embedded devices has many applications. This paper describes a low-power technique for the object re-identification (reID) problem: matching a query image against a gallery of previously-seen images. State-ofthe-art techniques rely on large, computationally-intensive Deep Neural Networks (DNNs). We propose a novel hierarchical DNN architecture that uses attribute labels in the training dataset to perform efficient object reID. At each node in the hierarchy, a small DNN identifies a different attribute of the query image. The small DNN at each leaf node is specialized to re-identify a subset of the gallery-only the images with the attributes identified along the path from the root to a leaf. Thus, a query image is reidentified accurately after processing with a few small DNNs. We compare our method with state-of-the-art object reID techniques. With a ∼4% loss in accuracy, our approach realizes significant resource savings: 74% less memory, 72% fewer operations, and 67% lower query latency, yielding 65% less energy consumption.

show abstract

“…MobileNetV2 [3] and MobileNetV3 [4] further reduce pa- rameters. LW CNN are deployed on mobile phones, such as Pixel 2 or Pixel 2XL [5], which still have 4GB memory. However, some IoT edge device, such as microcontrollers, are even more memory-limited.…”

Section: Introductionmentioning

confidence: 99%

“…We integrate TF Lite in our toolchain, and implement function of TF Lite and two round-to-nearest functions of gemmlowp library to get high-accuracy result. We also design flexible PE arrays, which support kernellevel parallelism with three different size (3,5,7). Besides spatial reuse, temporal reuse is adopted at CONV layer (both row and column level of an IFM) and FC layer.…”

Section: Introductionmentioning

confidence: 99%

HBDCA: A Toolchain for High-Accuracy BRAM-Defined CNN Accelerator on FPGA with Flexible Structure

Gao

Lai

2021

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

In recent years FPGA has become popular in CNN acceleration, and many CNN-to-FPGA toolchains are proposed to fast deploy CNN on FPGA. However, for these toolchains, updating CNN network means regeneration of RTL code and re-implementation which is timeconsuming and may suffer timing-closure problems. So, we propose HB-DCA: a toolchain and corresponding accelerator. The CNN on HBDCA is defined by the content of BRAM. The toolchain integrates UpdateMEM utility of Xilinx, which updates content of BRAM without re-synthesis and re-implementation process. The toolchain also integrates TensorFlow Lite which provides high-accuracy quantization. HBDCA supports 8-bits perchannel quantization of weights and 8-bits per-layer quantization of activations. Upgrading CNN on accelerator means the kernel size of CNN may change. Flexible structure of HBDCA supports kernel-level parallelism with three different sizes (3 × 3, 5 × 5, 7 × 7). HBDCA implements four types of parallelism in convolution layer and two types of parallelism in fully-connected layer. In order to reduce access number to memory, both spatial and temporal data-reuse techniques were applied on convolution layer and fully-connect layer. Especially, temporal reuse is adopted at both row and column level of an Input Feature Map of convolution layer. Data can be just read once from BRAM and reused for the following clock. Experiments show by updating BRAM content with single Up-dateMEM command, three CNNs with different kernel size (3 × 3, 5 × 5, 7×7) are implemented on HBDCA. Compared with traditional design flow, UpdateMEM reduces development time by 7.6X-9.1X for different synthesis or implementation strategy. For similar CNN which is created by toolchain, HBDCA has smaller latency (9.97µs-50.73µs), and eliminates re-implementation when update CNN. For similar CNN which is created by dedicated design, HBDCA also has the smallest latency 9.97µs, the highest accuracy 99.14% and the lowest power 1.391W. For different CNN which is created by similar toolchain which eliminate re-implementation process, HBDCA achieves higher speedup 120.28X.

show abstract

Low-Power Computer Vision: Status, Challenges, and Opportunities

Cited by 66 publications

References 25 publications

Modular Neural Networks for Low-Power Image Classification on Embedded Devices

Modular Neural Networks for Low-Power Image Classification on Embedded Devices

Low-power object counting with hierarchical neural networks

HBDCA: A Toolchain for High-Accuracy BRAM-Defined CNN Accelerator on FPGA with Flexible Structure

Contact Info

Product

Resources

About