Speech is a complex naturally acquired human motor ability. It is characterized in adults with the production of about 14 different sounds per second via the harmonized actions of roughly 100 muscles. Speaker recognition is the capability of a software or hardware to receive speech signal, identify the speaker present in the speech signal and recognize the speaker afterwards. Feature extraction is accomplished by changing the speech waveform to a form of parametric representation at a relatively minimized data rate for subsequent processing and analysis. Therefore, acceptable classification is derived from excellent and quality features. Mel Frequency Cepstral Coefficients (MFCC), Linear Prediction Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), Line Spectral Frequencies (LSF), Discrete Wavelet Transform (DWT) and Perceptual Linear Prediction (PLP) are the speech feature extraction techniques that were discussed in these chapter. These methods have been tested in a wide variety of applications, giving them high level of reliability and acceptability. Researchers have made several modifications to the above discussed techniques to make them less susceptible to noise, more robust and consume less time. In conclusion, none of the methods is superior to the other, the area of application would determine which method to select.
Stuttered speech is a dysfluency rich speech, more prevalent in males than females. It has been associated with insufficient air pressure or poor articulation, even though the root causes are more complex. The primary features include prolonged speech and repetitive speech, while some of its secondary features include, anxiety, fear, and shame. This study used LPC analysis and synthesis algorithms to reconstruct the stuttered speech. The results were evaluated using cepstral distance, Itakura-Saito distance, mean square error, and likelihood ratio. These measures implied perfect speech reconstruction quality. ASR was used for further testing, and the results showed that all the reconstructed speech samples were perfectly recognized while only three samples of the original speech were perfectly recognized. Keywords: stuttered speech, speech reconstruction, LPC analysis, LPC synthesis, objective quality measure AbstrakShuttered speech adalah speech yang kaya dysfluency, lebih banyak terjadi pada laki-laki daripada perempuan. Ini terkait dengan tekanan udara yang tidak cukup atau artikulasi yang buruk, meskipun akar penyebabnya lebih kompleks. Fitur utama termasuk speech yang berkepanjangan dan berulangulang, sementara beberapa fitur sekunder meliputi, kecemasan, ketakutan, dan rasa malu. Penelitian ini menggunakan LPC analysis dan synthesis algoritma untuk merekonstruksi stuttered speech. Hasil dievaluasi menggunakan jarak cepstral, jarak Itakura-Saito, mean square error, dan rasio likelihood. Langkah-langkah ini terkandung kualitas speech reconstruction yang sempurna. ASR digunakan untuk pengujian lebih lanjut, dan hasilnya menunjukkan bahwa semua sampel speech yang terekonstruksi dikenali dengan sempurna sementara hanya tiga sampel dari speech asli dikenali dengan sempurna.
Abstract. The level crossing (LX) or railway crossing being an intersection between a public road and a railway line, can be controlled actively or passively. Sound recognition can be used to actively control a level crossing. A system is proposed in this study for the use of sound to control a LX. This proposed system uses Mel Frequency Cepstral Coefficient (MFCC) as feature extractor, and Recurrent Neural Network (RNN) as classifier. The proposed system has shown a great potential that could be harnessed to contribute to the reduction in the loss of lives and properties at the LX.
Stuttering or stammering is disruptions in the normal flow of speech by dysfluencies, which can be repetitions or prolongations of phoneme or syllable. Stuttering cannot be permanently cured, though it may go into remission or stutterers can learn to shape their speech into fluent speech with an appropriate speech pathology treatment. Linear Prediction Coefficient (LPC), Linear Prediction Cepstral Coefficient (LPCC) and Line Spectral Frequency (LSF) were used for the feature extraction, while Multilayer Perceptron (MLP) was used as the classifier. The samples used were obtained from UCLASS (University College London Archive of Stuttered Speech) release 1. The LPCC-MLP system had the highest overall sensitivity, precision and the lowest overall misclassification rate. LPCC-MLP system had challenges with F3, the sensitivity of the system to F3 was negligible, similarly, the precision was moderate and the misclassification rate was negligible, but above 10%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.