2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016
DOI: 10.1109/icassp.2016.7472678
|View full text |Cite
|
Sign up to set email alerts
|

Online speaking rate estimation using recurrent neural networks

Abstract: A reliable online speaking rate estimation tool is useful in many domains, including speech recognition, speech therapy intervention, speaker identification, etc. This paper proposes an online speaking rate estimation model based on recurrent neural networks (RNNs). Speaking rate is a long-term feature of speech, which depends on how many syllables were spoken over an extended time window (seconds). We posit that since RNNs can capture long-term dependencies through the memory of previous hidden states, they a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
5

Relationship

2
8

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 21 publications
0
9
0
Order By: Relevance
“…The approach does not require manual segmentation and correlates strongly with the more cumbersome approach that requires hand-labeling of 19 Speech rate was measured using the feature-based approach. 20,21 Furthermore, the rate was estimated for every 3-s segment in the speech signal with a 2-s overlap. This allowed an estimation of the average rate for the entire passage and the variation in speech rate.…”
Section: Automated Acoustic Analysismentioning
confidence: 99%
“…The approach does not require manual segmentation and correlates strongly with the more cumbersome approach that requires hand-labeling of 19 Speech rate was measured using the feature-based approach. 20,21 Furthermore, the rate was estimated for every 3-s segment in the speech signal with a 2-s overlap. This allowed an estimation of the average rate for the entire passage and the variation in speech rate.…”
Section: Automated Acoustic Analysismentioning
confidence: 99%
“…Recently, there has been a surge of research using speech analytics to detect and track a range of neurological diseases such as Parkinson's (Orozco-Arroyave et al, 2016a,b;Hsu et al, 2017;Benba et al, 2015) and ALS ill;Norel et al, 2018;Wang et al, 2016aWang et al, ,b, 2018. Efforts towards tracking disease progression in this area have typically focused on the estimation of speech specific measures such as speech intelligibility (Berisha et al, 2013;Kim et al, 2015), speaking rate (Jiao et al, 2016;Martens et al, 2015), or severity (Tu et al, 2017;Asgari and Shafran, 2010). While these efforts have shown success in the ability to objectively measure functional changes directly related to speech, whether speech can be used to measure functional impairment along other tasks in ALS remains largely unexplored.…”
Section: Introductionmentioning
confidence: 99%
“…To verify our hypothesis, we created another experiment based on the TIMIT dataset. It comes with phoneme and word level annotations, from which the speaking rate (defined as syllables per second) can be computed for each input sample [40]. To reduce the influence of the different acoustical environment in TIMIT compared to Libri Speech, we retrained the CRNN classification model on the TIMIT training dataset, using the same parameters as described in Section 4.…”
Section: Ablation Analysismentioning
confidence: 99%