2023
DOI: 10.3390/app13137579
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths

Abstract: Speech is critical for interpersonal communication, but not everyone has fluent communication skills. Speech disfluency, including stuttering and interruptions, affects not only emotional expression but also clarity of expression for people who stutter. Existing methods for detecting speech disfluency rely heavily on annotated data, which can be costly. Additionally, these methods have not considered the issue of variable-length disfluent speech, which limits the scalability of detection methods. To address th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 48 publications
0
2
0
Order By: Relevance
“…The most widely studied speech disorder is stuttering [14][15][16][17][18][19]. The developed approaches to detect stuttering events usually rely on two datasets, SEP-28k [20] and FluencyBank [21] which are used to train, validate and compare the proposed approaches.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The most widely studied speech disorder is stuttering [14][15][16][17][18][19]. The developed approaches to detect stuttering events usually rely on two datasets, SEP-28k [20] and FluencyBank [21] which are used to train, validate and compare the proposed approaches.…”
Section: Introductionmentioning
confidence: 99%
“…Several approaches have been proposed, for instance, MFCC features have been used to train a neural network comprising an LSTM layer [15]. However, most of the approaches today [14,[17][18][19] use Wav2Vec2 [1]. Wav2vec2 has the advantage of taking a raw audio waveform as input, thus avoiding a laborious step of parameters selection for the feature extraction, and to learn features with a reasonable amount of data as it has many pre-trained models in different languages and for different tasks.…”
Section: Introductionmentioning
confidence: 99%