The magnitude spectrum is a popular mathematical tool for speech signal analysis. In this paper, we propose a new technique for improving the performance of the magnitude spectrum by utilizing the benefits of the group delay (GD) spectrum to estimate the characteristics of a vocal tract accurately. The traditional magnitude spectrum suffers from difficulties when estimating vocal tract characteristics, particularly for high-pitched speech owing to its low resolution and high spectral leakage. After phase domain analysis, it is observed that the GD spectrum has low spectral leakage and high resolution for its additive property. Thus, the magnitude spectrum modified with its GD spectrum, referred to as the modified spectrum, is found to significantly improve the estimation of formant frequency over traditional methods. The accuracy is tested on synthetic vowels for a wide range of fundamental frequencies up to the high-pitched female speaker range. The validity of the proposed method is also verified by inspecting the formant contour of an utterance from the Texas Instruments and Massachusetts Institute of Technology (TIMIT) database and standard F2-F1 plot of natural vowel speech spoken by male and female speakers. The result is compared with two state-of-the-art methods. Our proposed method performs better than both of these two methods.
Estimating the formant frequencies of high-pitched speech is essential in many speech processing applications. Unfortunately, most existing methods cannot accurately estimate the formant frequencies from high-pitched speech. Moreover, the available formant estimators do not show noise immunity. In this paper, we propose a higher-order group delay (GD) spectrum-based deconvolution method for formant estimation of high-pitched noisy speech with higher accuracy. Although cepstrum is known to provide a source-filter separation, to some extent, it gets affected by ambient noise. We employ the spectral-root-deconvolution technique on the third-order GD spectrum that yields a noiserobust cepstrum. The resulting cepstrum is found to produce significant improvement when estimating formant frequencies. We evaluated the proposed method on five synthetic vowels and some natural vowels spoken by male and female speakers by calculating the estimation error of the formant frequencies and standard F2-F1 plots, respectively. An utterance from the Texas Instruments and Massachusetts Institute of Technology (TIMIT) database has been utilized to plot the formant contours on the respective spectrogram. We compared the results with the three state-of-the-art methods. Our proposed technique outperforms all approaches, particularly with high-pitched speaking in a noisy environment.
Speech signal analysis based on the Phase Spectrum (PS) in the recent years becomes more popular to the researchers due to its attractive properties. Though some problem arises during its processing but appropriate modification gives the useful result. In this study, we proposed a new method of speech signal analysis based on its phase spectral representation. After analyzing the PS we modified it vocal tract dominated spectrum by utilizing the minimum-phase stable component of the speech spectrum. The modified signal holding only the phase spectral component is used for the parametric modeling of the speech signal to estimate the characteristics of vocal tract perfectly. This extracted feature was found to provide complementary evidence from the magnitude spectrum of speech but with better resolution. According to perceptual analysis, the PS takes precedence over the magnitude spectrum. This research utilizes the Group Delay (GD) spectrum as a representative of the PS because of its meaningful characteristic. All pole modeling is performed on the signal evaluated from the GD spectrum to find the resonances of the vocal tract system accurately. The effectiveness of the method is tested by synthesizing some vowels over a range of pitch periods from low to high pitched speech. The validity of the proposed method is also verified by plotting the formant contour on the spectrogram of a sentence from the TIMIT database and standard F2-F1 plot of natural speech spoken by male and female speakers. The proposed method performs better than the state-of-the-art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.