An Automatic Prolongation Detection Approach in Continuous Speech With Robustness Against Speaking Rate Variations

Esmaili, Iman; Dabanloo, Nader Jafarnia; Vali, Mansour

doi:10.4103/2228-7477.199156

Cited by 7 publications

(8 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the unsupervised approach, quasisilent areas of the signal are first removed using an automatic speech detection model and then the similarity between successive frames is used to produce initial estimates of possible prolongation segments. If the duration of the detected segment is found to be greater than a predefined threshold, the segment is labelled as a prolongation event; otherwise, it is considered a normal segment [22].…”

Section: Literature Reviewmentioning

confidence: 99%

“…Stutterers usually have a lower speaking rate than normal speakers, so thresholds for detecting prolongations had to account for natural variations in fluency and speaking rate on different occasions [32]. We followed the unsupervised approach of [22,33], which uses two thresholds to decide whether two successive frames were similar, and whether the duration of similar frames was sufficient to count as a prolongation. We found empirically that 0.9 was the best value for the first; but the second threshold had to be set dynamically, according to speaking rate.…”

Section: Prolongation Detection Systemmentioning

confidence: 99%

See 1 more Smart Citation

A Lightly Supervised Approach to Detect Stuttering in Children's Speech

et al. 2018

View full text Add to dashboard Cite

In speech pathology, new assistive technologies using ASR and machine learning approaches are being developed for detecting speech disorder events. Classically-trained ASR model tends to remove disfluencies from spoken utterances, due to its focus on producing clean and readable text output. However, diagnostic systems need to be able to track speech disfluencies, such as stuttering events, in order to determine the severity level of stuttering. To achieve this, ASR systems must be adapted to recognise full verbatim utterances, including pseudo-words and non-meaningful part-words. This work proposes a training regime to address this problem, and preserve a full verbatim output of stuttering speech. We use a lightly-supervised approach using task-oriented lattices to recognise the stuttering speech of children performing a standard reading task. This approach improved the WER by 27.8% relative to a baseline that uses word-lattices generated from the original prompt. The improved results preserved 63% of stuttering events (including sound, word, part-word and phrase repetition, and revision). This work also proposes a separate correction layer on top of the ASR that detects prolongation events (which are poorly recognised by the ASR). This increases the percentage of preserved stuttering events to 70%.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Section: Prolongation Detection Systemmentioning

confidence: 99%

A Lightly Supervised Approach to Detect Stuttering in Children's Speech

et al. 2018

View full text Add to dashboard Cite

show abstract

“…The various feature extraction methods that have been explored in the stuttering recognition systems are autocorrelation function and envelope parameters [78], duration, energy peaks, spectral of word based and part word based [79][80][81], age, sex, type of disfluency, frequency of disfluency, duration, physical concomitant, rate of speech, historical, attitudinal and behavioral scores, family history [38], duration and frequency of disfluent portions, speaking rate [26], frequency, 1 𝑠𝑡 to 3 𝑟𝑑 formant's frequencies and its amplitudes [81,82], spectral measure (fast Fourier transform (FFT) 512) [83,84], mel frequency cepstral coefficients (MFCC) [81,[85][86][87], Linear Predictive Cepstral Coefficients (LPCCs) [81,86], pitch, shimmer [88], zero crossing rate (ZCR) [81], short time average magnitude, spectral spread [81], linear predictive coefficients (LPC), weighted linear prediction cepstral coefficients (WLPCC) [86], maximum autocorrelation value (MACV) [81], linear prediction-Hilbert transform based MFCC (LH-MFCC) [89], noise to harmonic ratio, shimmer harmonic to noise ratio , harmonicity, amplitude perturbation quotient, formants and its variants (min, max, mean, median, mode, std), spectrum centroid [88], Kohonen's self-organizing Maps [84], i-vectors [90], perceptual linear predictive (PLP) [87], respiratory biosignals [39], and sample entropy feature [91]. With the recent developments in convolutional neural networks, the feature representation of stuttered speech is moving towards spectrogram representations from conventional MFCCs.…”

Section: Statistical Approachesmentioning

confidence: 99%

Machine Learning for Stuttering Identification: Review, Challenges and Future Directions

Sheikh¹,

Sahidullah²,

Hirsch³

et al. 2021

Preprint

View full text Add to dashboard Cite

Stuttering is a speech disorder during which the flow of speech is interrupted by involuntary pauses and repetition of sounds. Stuttering identification is an interesting interdisciplinary domain research problem which involves pathology, psychology, acoustics, and signal processing that makes it hard and complicated to detect. Recent developments in machine and deep learning have dramatically revolutionized speech domain, however minimal attention has been given to stuttering identification. This work fills the gap by trying to bring researchers together from interdisciplinary fields. In this paper, we review comprehensively acoustic features, statistical and deep learning based stuttering/disfluency classification methods. We also present several challenges and possible future directions.

show abstract

“…Existing literature at the intersection of AI and stuttering focuses on building machine learning systems to identify and classify different types of disfluencies like Blocks, Prolongations [17], Sound Repetitions [47], Interjections, etc. in speech utterances [9].…”

Section: Background and Related Work 21 Ai For Stutteringmentioning

confidence: 99%

“…in speech utterances [9]. Such systems are typically trained on speech samples which are annotated for different kinds of disfluencies [2,17,18,36,57]. Other approaches have leveraged data based on facial muscle movements [13], breathing patterns [64], brain activity [30], etc.…”

Section: Background and Related Work 21 Ai For Stutteringmentioning

confidence: 99%

Fluent: An AI Augmented Writing Tool for People who Stutter

Ghai

Mueller

2021

Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility

View full text Add to dashboard Cite

Stuttering is a speech disorder which impacts the personal and professional lives of millions of people worldwide. To save themselves from stigma and discrimination, people who stutter (PWS) may adopt different strategies to conceal their stuttering. One of the common strategies is word substitution where an individual avoids saying a word they might stutter on and use an alternative instead. This process itself can cause stress and add more burden. In this work, we present Fluent, an AI augmented writing tool which assists PWS in writing scripts which they can speak more fluently. Fluent embodies a novel active learning based method of identifying words an individual might struggle pronouncing. Such words are highlighted in the interface. On hovering over any such word, Fluent presents a set of alternative words which have similar meaning but are easier to speak. The user is free to accept or ignore these suggestions. Based on such user interaction (feedback), Fluent continuously evolves its classifier to better suit the personalized needs of each user. We evaluated our tool by measuring its ability to identify difficult words for 10 simulated users. We found that our tool can identify difficult words with a mean accuracy of over 80% in under 20 interactions and it keeps improving with more feedback. Our tool can be beneficial for certain important life situations like giving a talk, presentation, etc. The source code for this tool has been made publicly accessible at github.com/bhavyaghai/Fluent. CCS CONCEPTS• Social and professional topics → People with disabilities; • Human-centered computing → Interactive systems and tools;• Computing methodologies → Active learning settings.

show abstract

An Automatic Prolongation Detection Approach in Continuous Speech With Robustness Against Speaking Rate Variations

Cited by 7 publications

References 15 publications

A Lightly Supervised Approach to Detect Stuttering in Children's Speech

A Lightly Supervised Approach to Detect Stuttering in Children's Speech

Machine Learning for Stuttering Identification: Review, Challenges and Future Directions

Fluent: An AI Augmented Writing Tool for People who Stutter

Contact Info

Product

Resources

About