<span>Many technology systems have used voice recognition applications to transcribe a speaker’s speech into text that can be used by these systems. One of the most complex tasks in speech identification is to know, which acoustic cues will be used to classify sounds. This study presents an approach for characterizing Arabic fricative consonants in two groups (sibilant and non-sibilant). From an acoustic point of view, our approach is based on the analysis of the energy distribution, in frequency bands, in a syllable of the consonant-vowel type. From a practical point of view, our technique has been implemented, in the MATLAB software, and tested on a corpus built in our laboratory. The results obtained show that the percentage energy distribution in a speech signal is a very powerful parameter in the classification of Arabic fricatives. We obtained an accuracy of 92% for non-sibilant consonants /f, χ, ɣ, ʕ, ћ, and h/, 84% for sibilants /s, sҁ, z, Ӡ and ∫/, and 89% for the whole classification rate. In comparison to other algorithms based on neural networks and support vector machines (SVM), our classification system was able to provide a higher classification rate.</span>
<span>The speech signal is described as many acoustic properties that may contribute differently to spoken word recognition. Vowel characterization is an important process of studying the acoustic characteristics or behaviors of speech within different contexts. This current study focuses on the modulators characteristics of three Arabic vowels, we proposed a new approach to characterize the three Arabic vowels /a/, /i/ and /u/. The proposed method is based on the energy contained in the speech modulators. The coherent subband demodulation method related to the spectral center of gravity (COG) was used to calculate the energy of the speech modulators. The obtained results showed that the modulators energy help characterize the Arabic vowels /a/, /i/ and /u/ with an interesting recognition rate ranging from 86% to 100%.</span>
Speech activity detection is a crucial preprocessing step, in many scientific fields, such as speech recognition, audio forensics, audio conferencing, and text-to-speech applications. It can be used in speech processing to deactivate various operations during non-speaking sections. This paper proposes an accurate and robust approach that aims to classify voiced and unvoiced segments. For this purpose, novel algorithms are adopted that combine two approaches, the fractal dimension for the envelopes, these envelopes are obtained by the novel approach single frequency filtering, with high temporal and spectral resolution. To make a simple and fast decision between speech and non-speech segments, the fractal dimension is computed using the Katz algorithm. This parameter has shown its effectiveness in speech activity detection in continuous speech, but this work improves its performance notably. Two different corpora, The Texas Instruments Massachusetts Institute of Technology and the King Saud University Arabic speech, are used to assess the performance of the proposed method. The results of the proposed method show a reliable performance compared with well-known methods. The proposed approach can separate speech and non-speech segments in noisy and clean speech and does not need training data or any assumption at the beginning of the audio of the non-speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.