SoK: A Modularized Approach to Study the Security of Automatic Speech Recognition Systems

Chen, Yuxuan; Zhang, Jiangshan; Yuan, Xuejing; Zhang, Shengzhi; Chen, Kai; Guo, Shanqing

doi:10.48550/arxiv.2103.10651

Cited by 1 publication

(1 citation statement)

References 90 publications

(222 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ASR is an active and essential research area owing to its wide range of applications, such as security [ 2 ], education [ 3 ], smart healthcare [ 4 , 5 ], and smart cities [ 6 ], as well as the development of interfaces and computing instruments that can enable voice processing. It is a combination of various approaches that assist in the conversion of acoustic data into text, using text matching applied to the detected speech signal occurring in the result.…”

Section: Introductionmentioning

confidence: 99%

Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language

Mukhamadiyev

Khujayorov

Djuraev

et al. 2022

Sensors

View full text Add to dashboard Cite

Communication has been an important aspect of human life, civilization, and globalization for thousands of years. Biometric analysis, education, security, healthcare, and smart cities are only a few examples of speech recognition applications. Most studies have mainly concentrated on English, Spanish, Japanese, or Chinese, disregarding other low-resource languages, such as Uzbek, leaving their analysis open. In this paper, we propose an End-To-End Deep Neural Network-Hidden Markov Model speech recognition model and a hybrid Connectionist Temporal Classification (CTC)-attention network for the Uzbek language and its dialects. The proposed approach reduces training time and improves speech recognition accuracy by effectively using CTC objective function in attention model training. We evaluated the linguistic and lay-native speaker performances on the Uzbek language dataset, which was collected as a part of this study. Experimental results show that the proposed model achieved a word error rate of 14.3% using 207 h of recordings as an Uzbek language training dataset.

show abstract