2017
DOI: 10.4103/2228-7477.199156
|View full text |Cite
|
Sign up to set email alerts
|

An Automatic Prolongation Detection Approach in Continuous Speech With Robustness Against Speaking Rate Variations

Abstract: In recent years, many methods have been introduced for supporting the diagnosis of stuttering for automatic detection of prolongation in the speech of people who stutter. However, less attention has been paid to treatment processes in which clients learn to speak more slowly. The aim of this study was to develop a method to help speech-language pathologists (SLPs) during diagnosis and treatment sessions. To this end, speech signals were initially parameterized to perceptual linear predictive (PLP) features. To… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 15 publications
0
8
0
Order By: Relevance
“…In the unsupervised approach, quasisilent areas of the signal are first removed using an automatic speech detection model and then the similarity between successive frames is used to produce initial estimates of possible prolongation segments. If the duration of the detected segment is found to be greater than a predefined threshold, the segment is labelled as a prolongation event; otherwise, it is considered a normal segment [22].…”
Section: Literature Reviewmentioning
confidence: 99%
See 1 more Smart Citation
“…In the unsupervised approach, quasisilent areas of the signal are first removed using an automatic speech detection model and then the similarity between successive frames is used to produce initial estimates of possible prolongation segments. If the duration of the detected segment is found to be greater than a predefined threshold, the segment is labelled as a prolongation event; otherwise, it is considered a normal segment [22].…”
Section: Literature Reviewmentioning
confidence: 99%
“…Stutterers usually have a lower speaking rate than normal speakers, so thresholds for detecting prolongations had to account for natural variations in fluency and speaking rate on different occasions [32]. We followed the unsupervised approach of [22,33], which uses two thresholds to decide whether two successive frames were similar, and whether the duration of similar frames was sufficient to count as a prolongation. We found empirically that 0.9 was the best value for the first; but the second threshold had to be set dynamically, according to speaking rate.…”
Section: Prolongation Detection Systemmentioning
confidence: 99%
“…The various feature extraction methods that have been explored in the stuttering recognition systems are autocorrelation function and envelope parameters [78], duration, energy peaks, spectral of word based and part word based [79][80][81], age, sex, type of disfluency, frequency of disfluency, duration, physical concomitant, rate of speech, historical, attitudinal and behavioral scores, family history [38], duration and frequency of disfluent portions, speaking rate [26], frequency, 1 𝑠𝑡 to 3 𝑟𝑑 formant's frequencies and its amplitudes [81,82], spectral measure (fast Fourier transform (FFT) 512) [83,84], mel frequency cepstral coefficients (MFCC) [81,[85][86][87], Linear Predictive Cepstral Coefficients (LPCCs) [81,86], pitch, shimmer [88], zero crossing rate (ZCR) [81], short time average magnitude, spectral spread [81], linear predictive coefficients (LPC), weighted linear prediction cepstral coefficients (WLPCC) [86], maximum autocorrelation value (MACV) [81], linear prediction-Hilbert transform based MFCC (LH-MFCC) [89], noise to harmonic ratio, shimmer harmonic to noise ratio , harmonicity, amplitude perturbation quotient, formants and its variants (min, max, mean, median, mode, std), spectrum centroid [88], Kohonen's self-organizing Maps [84], i-vectors [90], perceptual linear predictive (PLP) [87], respiratory biosignals [39], and sample entropy feature [91]. With the recent developments in convolutional neural networks, the feature representation of stuttered speech is moving towards spectrogram representations from conventional MFCCs.…”
Section: Statistical Approachesmentioning
confidence: 99%
“…Existing literature at the intersection of AI and stuttering focuses on building machine learning systems to identify and classify different types of disfluencies like Blocks, Prolongations [17], Sound Repetitions [47], Interjections, etc. in speech utterances [9].…”
Section: Background and Related Work 21 Ai For Stutteringmentioning
confidence: 99%
“…in speech utterances [9]. Such systems are typically trained on speech samples which are annotated for different kinds of disfluencies [2,17,18,36,57]. Other approaches have leveraged data based on facial muscle movements [13], breathing patterns [64], brain activity [30], etc.…”
Section: Background and Related Work 21 Ai For Stutteringmentioning
confidence: 99%