Formant frequency estimation and tracking are among the most fundamental problems in speech processing. In the estimation task, the input is a stationary speech segment such as the middle part of a vowel, and the goal is to estimate the formant frequencies, whereas in the tracking task the input is a series of speech frames and the goal is to track the trajectory of the formant frequencies throughout the signal. We propose using supervised machine learning techniques trained on an annotated corpus of read speech for these tasks. We evaluated two sets of deep networks architectures: feedforward networks and convolutional networks. The input to the former is composed of LPC-based cepstral coefficients with a range of model orders and pitch-synchronous cepstral coefficients, where the input to the latter is the raw spectrogram. The performance of our method compares favorably with alternative methods for formant estimation and tracking. We further propose a change in the network architecture that allows adaption of the models to new domains and speaker types. We evaluated our adapted networks on three datasets, each of which had different speaker characteristics and speech styles. After adaptation, the performance is further improved and can handle a variety of conditions.
Formant frequency estimation and tracking are among the most fundamental problems in speech processing. In the former task the input is a stationary speech segment such as the middle part of a vowel and the goal is to estimate the formant frequencies, whereas in the latter task the input is a series of speech frames and the goal is to track the trajectory of the formant frequencies throughout the signal. Traditionally, formant estimation and tracking is done using ad-hoc signal processing methods. In this paper we propose using machine learning techniques trained on an annotated corpus of read speech for these tasks. Our feature set is composed of LPC-based cepstral coefficients with a range of model orders and pitch-synchronous cepstral coefficients. Two deep network architectures are used as learning algorithms: a deep feed-forward network for the estimation task and a recurrent neural network for the tracking task. The performance of our methods compares favorably with mainstream LPC-based implementations and state-of-the-art tracking algorithms.
No abstract
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.