2022
DOI: 10.48550/arxiv.2201.04908
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

Abstract: In this paper, we investigate several existing and a new stateof-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition. We compare key components of existing methods as part of a rigorous ablation study to find the most effective solution to improve dysarthric speech recognition. We find that straightforward signal processing methods such as stationary noise removal and vocoder-based time stretching lead to dysarthr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 16 publications
0
2
0
Order By: Relevance
“…These works, however, require parallel data during training or known speaker identities during test time. In [16] it was shown that speech recognition for dysarthric speech can be enhanced by a simple linear interpolation of an utterance's spectrogram and outperforms cycle-based VC approaches on parallel data.…”
Section: Introductionmentioning
confidence: 99%
“…These works, however, require parallel data during training or known speaker identities during test time. In [16] it was shown that speech recognition for dysarthric speech can be enhanced by a simple linear interpolation of an utterance's spectrogram and outperforms cycle-based VC approaches on parallel data.…”
Section: Introductionmentioning
confidence: 99%
“…Cycle-consistent and adversarial loss are utilized to train CycleGAN without requiring paired data. It has been applied to many unsupervised domain adaptation tasks, including voice conversion [23][24][25]. In addition to these DNNbased conversion models, traditional signal processing methods, e.g., formant modification [26] and time-scale modification [10], can also be applied to acoustic feature conversion.…”
Section: Introductionmentioning
confidence: 99%