A 0.61-μW Fully Integrated Keyword-Spotting ASIC With Real-Point Serial FFT-Based MFCC and Temporal Depthwise Separable CNN

Li, Cai; Zhi, Haochang; Yang, Kaiyue; Qian, Junyi; Yan, Zhihao; Zhu, Lixuan; Chen, Chao; Wang, Xi; Shan, Weiwei

doi:10.1109/jssc.2023.3339528

Cited by 3 publications

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Efficient Real-Time Smart Keyword Spotting Using Spectrogram-Based Hybrid CNN-LSTM for Edge System

Syafalni,

Amadeus,

Sutisna

et al. 2024

IEEE Access

View full text Add to dashboard Cite

Keyword Spotting (KWS) is the task of recognizing spoken command words from a database. With recent application human-machine interactions, KWS systems require real-time performance, where edge computing is a preferable option. To allow KWS systems to work on fast and real-time implementation, a low-complexity yet high-accurate AI model is mandatory. In this paper, we propose a comprehensive voice command recognition system design and its hardware implementation. The proposed AI model considered in this system is SpectroNet-based and an efficient hybrid CNN-LSTM architecture with low complexity. Jetson Xavier NX is an edge device because of its strong computational power as an embedded device. The implementation result shows the proposed method offers quite good in terms of accuracy, indicated by no accuracy drop between the model implemented in PC and Jetson Xavier. However, the inference time is quite high, which is 180 ms/step. To improve the speed of the system, the TensorRT library is used to further optimize the model. Optimization of the model is found effective, reducing 59.35% of the total operation performed in SpectroNet when FP32 precision is used, and 59.63% when FP16 precision is used. The model is also sped up by 45% if FP32 precision mode is used and 62% if FP16 precision mode is used. However, there is a slight accuracy drop of 2.68% if FP32 precision mode is used and 4.84% if FP16 precision mode is used. This slight drop in accuracy is considered negligible compared to the performance boost that TensorRT gives. The work is useful for intelligent control systems such as smart vehicles, smartphones, computers, and smart communications.

show abstract