Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-490
|View full text |Cite
|
Sign up to set email alerts
|

Formant Estimation and Tracking Using Deep Learning

Abstract: Formant frequency estimation and tracking are among the most fundamental problems in speech processing. In the former task the input is a stationary speech segment such as the middle part of a vowel and the goal is to estimate the formant frequencies, whereas in the latter task the input is a series of speech frames and the goal is to track the trajectory of the formant frequencies throughout the signal. Traditionally, formant estimation and tracking is done using ad-hoc signal processing methods. In this pape… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
4

Relationship

2
6

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 25 publications
0
9
0
Order By: Relevance
“…"KARMA" denotes the state-of-the-art KF-based formant tracking method published in [14]. "DeepF" (DeepFormants) denotes the deep-learning based formant tracking method proposed recently in [16], [18], [51]. It is worth emphasizing that DeepF is based on supervised learning and calls for an annotated speech corpus to be trained.…”
Section: E Experiments On Natural Speech Datamentioning
confidence: 99%
See 1 more Smart Citation
“…"KARMA" denotes the state-of-the-art KF-based formant tracking method published in [14]. "DeepF" (DeepFormants) denotes the deep-learning based formant tracking method proposed recently in [16], [18], [51]. It is worth emphasizing that DeepF is based on supervised learning and calls for an annotated speech corpus to be trained.…”
Section: E Experiments On Natural Speech Datamentioning
confidence: 99%
“…However, it should be mentioned here that there are a few exceptions, such as [15], which uses a non-negative matrix factorization (NMF)based source-filter modeling of speech signals. Recently, deep learning-based techniques [16]- [18] have also been studied as alternatives to conventional statistical signal processing-based formant estimation and tracking methods. These methods, however, are based on supervised machine learning, which calls for having annotated speech corpora with which to obtain the ground truth formant frequencies for system training.…”
Section: Introductionmentioning
confidence: 99%
“…The aforementioned ad-hoc signal processing methods [17] usually emerge false peaks and formant merging when affected by high pitch or coarticulation. These problems can be alleviated by visually correcting with the help of linguistic knowledge and spectral analysis.…”
Section: Introductionmentioning
confidence: 99%
“…For example, Mehta et al evaluated their proposed Kalman-based autoregressive moving average modeling methods on this database [9]. Inspired by the great success of deep learning in many application areas, Dissen et al employed Long Short-Term Memory (LSTM) networks to train a supervised regression model between LPCCs plus Pitch-Synchronous Cepstrum Coefficients (named PSCCs) and handcorrected formant frequencies for every speech frame [17]. Later, Dissen et al [19] explored the potential of raw spectrograms (55 × 50 PSCCs) for formant tracking with Convolutional LSTM networks [20] and found that incorporating the PSCCs and LPCCs achieved the better general performance than using them separately.…”
Section: Introductionmentioning
confidence: 99%
“…ity [22][23][24][25][26]. Our methods rely on several advances over existing computational systems: novel representations of the speech signal and new structured prediction and deep learning algorithms.…”
Section: Introductionmentioning
confidence: 99%