2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
DOI: 10.1109/icassp.2000.861830
|View full text |Cite
|
Sign up to set email alerts
|

On-line speaking rate estimation using Gaussian mixture models

Abstract: Gaussian Mixture Models (GMM) are a widespread tool in applications like speaker identification or verification. In contrast to Hidden Markov Models (HMM) Gaussian Mixture Models are designed to model the general properties of an underlying acoustic source. In our paper we extend the application of GMMs to the assessment of speaking rate. Directly trained on the acoustic data, they can be either applied directly to estimate the speech rate category orwith the help of a mapping functionthey can provide a contin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
29
0

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 21 publications
(29 citation statements)
references
References 5 publications
0
29
0
Order By: Relevance
“…An Euclidean distance is used to estimate this dependency and to discriminate between slow and fast speech. In Falthauser et al (2000), speaking rate dependent GMMs are used to classify speech spurts into slow, medium and fast speech. The output likelihoods of these GMMs are used as input to a neural network whose targets are the actual phonemes.…”
Section: Rate Of Speechmentioning
confidence: 99%
“…An Euclidean distance is used to estimate this dependency and to discriminate between slow and fast speech. In Falthauser et al (2000), speaking rate dependent GMMs are used to classify speech spurts into slow, medium and fast speech. The output likelihoods of these GMMs are used as input to a neural network whose targets are the actual phonemes.…”
Section: Rate Of Speechmentioning
confidence: 99%
“…However, in [13] we used a combination of multiple acoustic features which was not applicable in real time. In [14], Faltlhauser et al proposed an online speaking rate estimation model based on neural networks. They used GMMs to first separate data into three rate groups (fast, moderate, slow) and built a neural network with the input of the likelihood values generated by GMMs.…”
Section: Introductionmentioning
confidence: 99%
“…The use of RNNs for speaking rate has not been explored in the literature to the best of our knowledge. Although neural networks (NN) have been used to estimate speaking rate in [14], this model does not exploit the longer-term dependencies that RNNs exploit. Moreover, our algorithm requires training a single RNN, whereas the work in [14] uses a sequential procedure that requires training independent models for slow, moderate, and fast speech.…”
Section: Introductionmentioning
confidence: 99%
“…In that direction Jiao et al (2015) proposed a convex optimization based speech rate estimation to avoid dependency on heuristic peak detection strategy. Faltlhauser et al (2000) used the Gaussian mixture model (GMM) for classification of speaking rate into three categories -slow, medium and fast. Following this, they used the class probabilities to estimate speaking rate with the help of Neural Networks.…”
Section: Introductionmentioning
confidence: 99%