2017
DOI: 10.1121/1.4979162
|View full text |Cite
|
Sign up to set email alerts
|

Estimating the spectral tilt of the glottal source from telephone speech using a deep neural network

Abstract: Estimation of the spectral tilt of the glottal source has several applications in speech analysis and modification. However, direct estimation of the tilt from telephone speech is challenging due to vocal tract resonances and distortion caused by speech compression. In this study, a deep neural network is used for the tilt estimation from telephone speech by training the network with tilt estimates computed by glottal inverse filtering. An objective evaluation shows that the proposed technique gives more accur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 9 publications
0
8
0
1
Order By: Relevance
“…For each signal, the standard acoustic correlates for prominence, namely, energy, F0, and duration were computed (section 2.1) together with several tilt measures that have been commonly used in the literature (section 2.2). In addition, a DNN-based spectral tilt estimation was also evaluated (similar to [26]) in order to investigate (i) the efficiency of DNN-based source tilt estimation for prominence, and (ii) the potential for the DNN to add robustness on tilt estimation for noisy signals (section 2.3). For all features, a number of aggregate statistical measures over words were then computed, and their capability to discriminate prominent from non-prominent words was measured in terms of the separability of the feature distributions.…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…For each signal, the standard acoustic correlates for prominence, namely, energy, F0, and duration were computed (section 2.1) together with several tilt measures that have been commonly used in the literature (section 2.2). In addition, a DNN-based spectral tilt estimation was also evaluated (similar to [26]) in order to investigate (i) the efficiency of DNN-based source tilt estimation for prominence, and (ii) the potential for the DNN to add robustness on tilt estimation for noisy signals (section 2.3). For all features, a number of aggregate statistical measures over words were then computed, and their capability to discriminate prominent from non-prominent words was measured in terms of the separability of the feature distributions.…”
Section: Methodsmentioning
confidence: 99%
“…A new method was proposed recently in [26] to estimate and parameterize the glottal source spectrum in noisy, non-ideal conditions where conventional GIF analysis cannot be used due to its known sensitivity to noise [28]. The method proposed in [26] uses a deep neural network (DNN) to map an input feature vector (the logarithmic speech power spectrum) into an output vector (all-pole model of the glottal source spectrum parameterized using line spectrum frequencies (LSFs)).…”
Section: Dnn-based Spectral Tilt Estimationmentioning
confidence: 99%
See 3 more Smart Citations