RAODAT: An Energy-Efficient Reconfigurable AI-based Object Detection and Tracking Processor with Online Learning

Gong, Yuchuan; Liu, Qingsong; Que, Luying; Jia, Conghan; Huang, Jiahui; Liu, Ye; Gan, Jiayan; Xie, Yi; Zhou, Yong; Liu, Li‐Li; Xiang, Xiaoqiang; Chang, Liang; Zhou, Jun

doi:10.1109/a-sscc53895.2021.9634785

Cited by 4 publications

(1 citation statement)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To improve the efficiency of hardware computation, a large number of neural-network-specific acceleration methods and corresponding accelerators have been proposed, including the optimization of computation and data access [ 11 , 12 , 13 ]. Traditional multiplication and addition (MAC) operations account for the majority of computation in neural network inference, which creates excessive logic resource overheads and power consumption in the process of hardware implementation.…”

Section: Introductionmentioning

confidence: 99%

SCA: Search-Based Computing Hardware Architecture with Precision Scalable and Computation Reconfigurable Scheme

Chang

Zhao

Zhou

2022

Sensors

Self Cite

View full text Add to dashboard Cite

Deep neural networks have been deployed in various hardware accelerators, such as graph process units (GPUs), field-program gate arrays (FPGAs), and application specific integrated circuit (ASIC) chips. Normally, a huge amount of computation is required in the inference process, creating significant logic resource overheads. In addition, frequent data accessions between off-chip memory and hardware accelerators create bottlenecks, leading to decline in hardware efficiency. Many solutions have been proposed to reduce hardware overhead and data movements. For example, specific lookup-table (LUT)-based hardware architecture can be used to mitigate computing operation demands. However, typical LUT-based accelerators are affected by computational precision limitation and poor scalability issues. In this paper, we propose a search-based computing scheme based on an LUT solution, which improves computation efficiency by replacing traditional multiplication with a search operation. In addition, the proposed scheme supports different precision multiple-bit widths to meet the needs of different DNN-based applications. We design a reconfigurable computing strategy, which can efficiently adapt to the convolution of different kernel sizes to improve hardware scalability. We implement a search-based architecture, namely SCA, which adopts an on-chip storage mechanism, thus greatly reducing interactions with off-chip memory and alleviating bandwidth pressure. Based on experimental evaluation, the proposed SCA architecture can achieve 92%, 96% and 98% computational utilization for computational precision of 4 bit, 8 bit and 16 bit, respectively. Compared with state-of-the-art LUT-based architecture, the efficiency can be improved four-fold.

show abstract

Section: Introductionmentioning

confidence: 99%