A 0.95 mJ/frame DNN Training Processor for Robust Object Detection with Real-World Environmental Adaptation

Han, Donghyeon; Im, Dongseok; Park, Gwangtae; Kim, Youngwoo; Song, Seokchan; Lee, Juhyoung; Yoo, Hoi‐Jun

doi:10.1109/aicas54282.2022.9869960

Cited by 5 publications

(9 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Adaptation after an unexpected situation such as a camera malfunction or abrupt domain change is also important to prevent fatal operational errors. [80] shows that online DNN tuning performed right after an unpredictable accident is one of the solutions to recovering its original performance As shown in both two examples, on-device adaptation seems promising but it must be accompanied by an energy-efficient and low-latency DNN training processor. Long latency due to the training rather disturbs the DNN inference and can cause other problems due to slow response.…”

Section: Adaptation: Short-term Dnn Trainingmentioning

confidence: 93%

“…It wastes lots of energy if there is no sparsity during the DNN training. [29,80] focused on this drawback and utilized bit-slice (4-bit) level sparsity. Moreover, [29] skipped partial accumulation slices which should be truncated before it is used for the next layer.…”

Section: In-and Out-slice Skippingmentioning

confidence: 99%

“…2) New Fixed-point Representation: Fixed-point (FXP) representation needed much higher bit-precision compared with the FP. The required bit-precision can be reduced dramatically when dynamic FXP (DFXP) [3,29,34,80] is adopted. It adjusts the required integer length to fit the layer-wise narrow distribution instead of considering entire layers.…”

Section: A New Number Representationmentioning

confidence: 99%

“…2) Active Training Supporting Unit: Both FGMP [21] and DFXP [3,29,34,35,80] need streaming data analysis units that can calculate the mean/variance or overflow ratio of the OA. Based on the analyzed tensor-wise statistics, the FGMP converts more than 90% of FP16 accumulation results to FP8 operands during the ResNet-18 training.…”

Section: ) Multiple-precision Configurable Multiply-add Unitmentioning

confidence: 99%

“…Section V, VI, VII, and VIII introduce 1) Sparsity-aware acceleration, 2) bit-precision optimization, 3) memory access optimization and 4) backward unlocking methodologies. Section IX summarizes recently developed DNN training processors and introduces design examples, HNPU-V1 [29] and HNPU-V2 [80], with the design philosophies of the inference and training processor design. The paper will be concluded with a discussion about future research direction and new challenges which should appear in the upcoming DNN training processors.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Energy-Efficient DNN Training Processors on Micro-AI Systems

Han

Kang

Kim

et al. 2022

IEEE Open J. Solid-State Circuits Soc.

View full text Add to dashboard Cite

Many edge/mobile devices are now able to utilize deep neural networks (DNNs) thanks to the development of mobile DNN accelerators. Mobile DNN accelerators overcame the problems of limited computing resources and battery capacity by realizing energy-efficient inference. However, its passive behavior makes it difficult for DNN to provide active customization for individual users or its service environment. The importance of onchip training is rising more and more to provide active interaction between DNN processors and ever-changing surroundings or conditions. Despite its advantages, the DNN training has more constraints than the inference such that it was considered impractical to be realized on mobile/edge devices. Recently, there are many trials to realize mobile DNN training, and a number of prior works will be summarized. Firstly, it arranges the new challenges of the DNN accelerator induced by training functionality and discusses new hardware features related to the challenges. Secondly, it explains algorithm-hardware cooptimization methods and explains why it becomes mainstream in mobile DNN training research. Thirdly, it compares the main differences between the conventional inference accelerators and recent training processors. Finally, the conclusion is made by proposing the future directions of the DNN training processor in micro-AI systems.

show abstract

Section: Adaptation: Short-term Dnn Trainingmentioning

confidence: 93%