LIS-Net: An end-to-end light interior search network for speech command recognition

Anh, Nguyễn Tuấn; Hu, Yongjian; He, Qianhua; Linh, Tran Thi Ngoc; Dung, Hoang Thi Kim; Chen, Guang

doi:10.1016/j.csl.2020.101131

Cited by 7 publications

(3 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…MS is used in conjunction with CNN, and it can distinguish vowel sounds, although the aggregate dataset is more complex due to many dimensions, such as various noises, ages, accents, environments, and physical characteristics (i.e., female vs. male voices). In the same way [46], MS was applied to the speech command recognition (SCR) task and achieved good performance. MS images with a feature size of 125 × 80 × 1 were used as acoustic features.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

An Acoustic Feature-Based Deep Learning Model for Automatic Thai Vowel Pronunciation Recognition

Rukwong

Pongpinigpinyo

2022

Applied Sciences

View full text Add to dashboard Cite

For Thai vowel pronunciation, it is very important to know that when mispronunciation occurs, the meanings of words change completely. Thus, effective and standardized practice is essential to pronouncing words correctly as a native speaker. Since the COVID-19 pandemic, online learning has become increasingly popular. For example, an online pronunciation application system was introduced that has virtual teachers and an intelligent process of evaluating students that is similar to standardized training by a teacher in a real classroom. This research presents an online automatic computer-assisted pronunciation training (CAPT) using deep learning to recognize Thai vowels in speech. The automatic CAPT is developed to solve the inadequacy of instruction specialists and the complex vowel teaching process. It is a unique system that develops computer techniques integrated with linguistic theory. The deep learning model is the most significant part of recognizing vowels pronounced for the automatic CAPT. The major challenge in Thai vowel recognition is the correct identification of Thai vowels when spoken in real-world situations. A convolutional neural network (CNN), a deep learning model, is applied and developed in the classification of pronounced Thai vowels. A new dataset for Thai vowels was designed, collected, and examined by linguists. The result of an optimal CNN model with Mel spectrogram (MS) achieves the highest accuracy of 98.61%, compared with Mel frequency cepstral coefficients (MFCC) with the baseline long short-term memory (LSTM) model and MS with the baseline LSTM model have an accuracy of 94.44% and 90.00% respectively.

show abstract

Section: Discussionmentioning

confidence: 99%

“…Mel spectrograms (MS) were converted from the raw speech signal (16 kHz) and then applied to the speech command recognition (SCR) task [46]. MS images with the feature size of 125 × 80 × 1 were used as acoustic features.…”

Section: Acoustic Featuresmentioning

confidence: 99%

An Acoustic Feature-Based Deep Learning Model for Automatic Thai Vowel Pronunciation Recognition

Rukwong

Pongpinigpinyo

2022

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…Speech command-based applications are widely used in various felds and have signifcantly enhanced human-computer interaction [6]. Speech recognition interfaces are integrated into digital devices, e-commerce, elearning, the Internet of Tings, robotics, and medical equipment to facilitate control and monitor through speech input [7,8].…”

Section: Introductionmentioning

confidence: 99%

Afan Oromo Speech-Based Computer Command and Control: An Evaluation with Selected Commands

Teshite,

Mamo,

Calpotura

2023

Advances in Human-Computer Interaction

View full text Add to dashboard Cite

Speech-based computer command and control utilize natural speech to enable computers to understand human language and execute tasks through commands. However, there has been no study or development of a speech-based command and control system for Microsoft Word in Afan Oromo. The primary aim of this research is to investigate and develop a speech-based command and control system for Afan Oromo using a selected set of command-and-control words from MS Word. To accomplish this objective, a speech recognizer was developed using the HTK toolkit, employing a small vocabulary, isolated words, speaker independence, and HMM-based techniques. The translation of the selected MS command words from English to Afan Oromo was completed in order to develop this automatic speech-based computer command system. Audio recordings were obtained from 38 speakers (16 females and 22 males) aged between 18 and 40 years, based on their availability. Word-level speech recognition was performed using MFCC and data processing, which are widely used and are effective approaches in speech recognition. Out of a total of 64 MS command words, 54 words (84.37%) were used for training and 10 words (15.63%) were used for testing. Live and nonlive evaluation techniques were employed to assess the performance of the recognizer. The live recognizer, which considers variations in the environment, outperformed the nonlive recognizer due to the influence of neighboring phones. The performance results for the monophone tied state, triphone, and triphone-based recognizers were 78.12%, 86.87%, and 88.99%, respectively. Thus, the triphone-based recognizer exhibited the best performance among the nonlive recognizers. The challenges of limited resources in this research study were limited to investigate speech-based commands for computers using only selected MS commands, which play a crucial role in text processing. In order to evaluate a speech-based interface in a real environment, there were no components available for object-as-a-service. The experimental findings of this study demonstrated that if an adequate amount of language resources was available, a computer-based Afan Oromo speech-based interface for command-and-control purposes could be developed.

show abstract

Robust Voice Activity Detection Based on Feature Fusion and Recurrent Neural Network

Dahy,

Darwish,

Hassanein

2024

Lecture Notes on Data Engineering and Communications Technologies

View full text Add to dashboard Cite

LIS-Net: An end-to-end light interior search network for speech command recognition

Cited by 7 publications

References 12 publications

An Acoustic Feature-Based Deep Learning Model for Automatic Thai Vowel Pronunciation Recognition

An Acoustic Feature-Based Deep Learning Model for Automatic Thai Vowel Pronunciation Recognition

Afan Oromo Speech-Based Computer Command and Control: An Evaluation with Selected Commands

Robust Voice Activity Detection Based on Feature Fusion and Recurrent Neural Network

Contact Info

Product

Resources

About