EERA-DNN: An energy-efficient reconfigurable architecture for DNNs with hybrid bit-width and logarithmic multiplier

Wang, Zhen; Xia, Mengwen; Liu, Bo; Ruan, Xing; Gong, Yu; Yang, Jinjiang; Ge, Wei; Yang, Jun

doi:10.1587/elex.15.20180212

Cited by 8 publications

(5 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, we can use approximate computing units with reduced power consumption to replace the traditional standard computing units adopted in DNNs. In our previous work [10], [11] and [13], we have proposed three digital approximate multiplication unit architectures to reduce the DNN computing power consumption. The approximate multiplication units can be dynamically reconfigured and adaptive to different accuracy requirements.…”

Section: B Approximate Computing For Dnnsmentioning

confidence: 99%

EERA-KWS: A 163 TOPS/W Always-on Keyword Spotting Accelerator in 28nm CMOS Using Binary Weight Network and Precision Self-Adaptive Approximate Computing

Liu

Wang²,

et al. 2019

IEEE Access

Self Cite

View full text Add to dashboard Cite

This paper proposed an energy-efficient reconfigurable accelerator for keyword spotting (EERA-KWS) based on binary weight network (BWN) and fabricated in 28-nm CMOS technology. This keyword spotting system consists of two parts: the feature extraction based on melscale frequency cepstral coefficients (MFCC) and the keywords classification based on a BWN model, which is trained through the Google's Speech Commands database and deployed on our custom. To reduce the power consumption while maintaining the system recognition accuracy, we first optimize the MFCC implementation with approximate computing techniques, including Pre-emphasis coefficient transformation, rectangular Mel filtering, Framing and FFT optimization. Then, we propose a precision self-adaptive reconfigurable accelerator with digital-analog mixed approximate computing units to process the BWN efficiently. Based on the SNR prediction of background noise and post-detection of network output confidence, the BWN accelerator data path can be dynamically and adaptively reconfigured as 4, 8, or 16 bits. For the BWN accelerator, we proposed a time-delay based addition unit to process bit-wise approximate computing for the convolution layers and fully connected layers, and a LUT based unit for the activation layers. Implemented under TSMC 28 nm HPC+ process technology, the estimated power is 77.8 µW ∼ 115.9µW, the energy efficiency can achieve 163 TOPS/W, which is over 1.8× better than the state-of-the-art architecture.

show abstract

Section: B Approximate Computing For Dnnsmentioning

confidence: 99%

EERA-KWS: A 163 TOPS/W Always-on Keyword Spotting Accelerator in 28nm CMOS Using Binary Weight Network and Precision Self-Adaptive Approximate Computing

Liu

Wang²,

et al. 2019

IEEE Access

Self Cite

View full text Add to dashboard Cite

show abstract

“…The conventional DNN optimization methods are pruning, encoding and quantization, which are discussed in work [8]- [10]. In our previous work [11] and [12], we proposed several compression methods with hybrid bit-width weights scheme, which can save the memory storage of the typical DNN networks, LeNet, AlexNet and EESEN by 7x∼8x. However, for KWS systems, where the adopted DNNs are typically compact networks customized for specific scenarios, these conventional network compression approaches with pruning and encoding, are likely to cause great accuracy loss.…”

Section: Preliminaries a Network Optimization Approaches For Low Powe...mentioning

confidence: 99%

“…Thus approximate multiplication units are required to be adopted in DNN processing because they can significantly improve energy efficient with little cost in accuracy loss. In our previous work [18] and [11], we have proposed two digital approximate multiplication unit architectures to reduce the DNN computing power consumption. These two approximate multiplication units are customized for DNNs based on the iterative logarithmic multiplication principle [19].…”

Section: B Energy Efficient Approximate Computing For Customized Dnnsmentioning

confidence: 99%

An Ultra-Low Power Always-On Keyword Spotting Accelerator Using Quantized Convolutional Neural Network and Voltage-Domain Analog Switching Network-Based Approximate Computing

Liu

Wang²,

Zhu

et al. 2019

IEEE Access

Self Cite

View full text Add to dashboard Cite

An ultra-low power always-on keyword spotting (KWS) accelerator is implemented in 22nm CMOS technology, which is based on an optimized convolutional neural network (CNN). To reduce the power consumption while maintaining the system recognition accuracy, we first perform a bit-width quantization method on the proposed CNN to reduce the data/weight bit width required by the hardware computing unit without reducing the recognition accuracy. Then, we propose an approximate computing architecture for the quantized CNN using voltage-domain analog switching network based multiplication and addition unit. Implementation results show that this accelerator can support 10 keywords real time recognition under different noise types and SNRs, while the power consumption can be significantly reduced to 52µW.

show abstract

“…However, NMS is a greedy algorithm that is computationally intensive and has a complexity of O(N 2 ), leading to increased processing time for a large number of detected targets. Recent many FPGA-based and ASIC edge neural network acceleration chips [7,8,9,10,11,12,13,14] such as UNPU [11], Eyeriss [12], and CASSANN-v2 [13], have been proposed to target general neural network operations (i.e., convolution). However, when deploying object detection neural networks, these chips often offload the NMS algorithm to the on-chip embedded CPU, significantly increasing the end-to-end inference time of object detection neural networks at the edge.Therefore, it is vital to develop a customized circuit to reduce the computation time of the NMS algorithm at the edge.…”

Section: Introductionmentioning

confidence: 99%

High-accuracy low-latency non-maximum suppression processor for traffic object detection

Yuan,

Xu,

Chen

2023

IEICE Electron. Express

View full text Add to dashboard Cite

As autonomous driving technology advances, the requirements for object detection are becoming increasingly high. Non-maximum suppression (NMS) algorithm, as a key component in traffic object detection algorithms, is an independent post-processing process in the object detection framework. Due to the complexity of real-world road scenarios and high density of detected entities in urban traffic, the number of candidate bounding boxes generated by the neural network is large. Hence, low-precision processors may generate a significant number of redundant target bounding boxes. The excessive output of redundant target bounding boxes not only imposes a workload on subsequent processing but also has the potential to result in non-optimal decision-making. We propose a highperformance NMS processor that can quickly process a large number of candidate boxes without performing sorting of their scores. Also, it has low precision loss computing units and high parallel computing arrays. Combined with algorithm design, it effectively reduces the computational complexity and reduces the inference time of the end-to-end task of the NMS algorithm. Thus, our NMS processor's speed is comparable to SOTA architecture, and the average accuracy loss is only 0.4% .

show abstract

EERA-DNN: An energy-efficient reconfigurable architecture for DNNs with hybrid bit-width and logarithmic multiplier

Cited by 8 publications

References 21 publications

EERA-KWS: A 163 TOPS/W Always-on Keyword Spotting Accelerator in 28nm CMOS Using Binary Weight Network and Precision Self-Adaptive Approximate Computing

EERA-KWS: A 163 TOPS/W Always-on Keyword Spotting Accelerator in 28nm CMOS Using Binary Weight Network and Precision Self-Adaptive Approximate Computing

An Ultra-Low Power Always-On Keyword Spotting Accelerator Using Quantized Convolutional Neural Network and Voltage-Domain Analog Switching Network-Based Approximate Computing

High-accuracy low-latency non-maximum suppression processor for traffic object detection

Contact Info

Product

Resources

About