Whispered Speech Recognition using Hidden Markov Models and Support Vector Machines

doi:10.12700/aph.15.5.2018.5.2

Cited by 1 publication

(2 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The final accuracy reached 96.39% (Whi-Spe), 90.61% (GEES), and 84.34% (DB1). Similar trends between HMM and SVM recognition were observed in experiments involving speaker-dependent cases [31].…”

Section: A Initial Experimentssupporting

confidence: 75%

“…Research studies, such as the one presented in [30], have shown better results in SVM recognition of isolated words using the first approach. The best results in recognizing words from the Whi-Spe database (in the speaker-dependent case) were achieved using 18 windows per utterance [31]. Therefore, in our experiments, segmentation into 18 overlapping windows was employed.…”

Section: ) Characteristics Of Svm-based Asr Systemmentioning

confidence: 99%

See 1 more Smart Citation

Untitled

2023

Adv. Electr. Comp. Eng.

View full text Add to dashboard Cite

Automatic Speech Recognition (ASR) systems are notorious for their poor performance in adverse conditions, leading to high sensitivity and low robustness. Due to the costly and time-consuming nature of creating extensive speech databases, addressing the issue of low robustness has become a prominent area of research, focusing on the synthetic generation of speech data using pre-existing natural speech. This paper examines the impact of standard data augmentation techniques, including pitch shift, time stretch, volume control, and their combination, on the accuracy of isolated-word ASR systems. The performance of three machine learning models, namely Hidden Markov Models (HMM), Support Vector Machines (SVM), and Convolutional Neural Networks (CNN), is analyzed on two Serbian corpora of isolated words. The Whi-Spe speech database in neutral phonation is utilized for augmentation and training, and a specifically developed Python-based software tool is employed for the augmentation process in this research study. The conducted experiments demonstrate a statistically significant reduction in the Word Error Rate (WER) for the CNN-based recognizer on both testing datasets, achieved through a single augmentation technique based on pitch-shifting.

show abstract

Section: A Initial Experimentssupporting

confidence: 75%

Section: ) Characteristics Of Svm-based Asr Systemmentioning

confidence: 99%