ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9747550
|View full text |Cite
|
Sign up to set email alerts
|

Towards Identity Preserving Normal to Dysarthric Voice Conversion

Abstract: We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. Such a framework is essential for (1) clinical decision making processes and alleviation of patient stress, (2) data augmentation for dysarthric speech recognition. This is an especially challenging task since the converted samples should capture the severity of dysarthric speech while being highly natural and possessing the speaker identity of the normal speaker. To this end, we ad… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(2 citation statements)
references
References 18 publications
0
2
0
Order By: Relevance
“…The TTS model simply fine-tuned a pretrained model on ELSIMU2. For the VC setup, we followed the same parallel VC using many-to-one setup as described in [47]. Since our dataset only contains one source healthy speaker, we generated multiple source healthy speakers using a pretrained TTS model to create a larger parallel corpus.…”
Section: B Implementation Detailsmentioning
confidence: 99%
“…The TTS model simply fine-tuned a pretrained model on ELSIMU2. For the VC setup, we followed the same parallel VC using many-to-one setup as described in [47]. Since our dataset only contains one source healthy speaker, we generated multiple source healthy speakers using a pretrained TTS model to create a larger parallel corpus.…”
Section: B Implementation Detailsmentioning
confidence: 99%
“…The large systematic mismatch between dysarthric and typical speech, high intraand inter-speaker variabilities, and data scarcity are three major challenges in ADSR. To enhance the performance of ADSR systems, previous studies have employed data augmentation, (e.g., speed perturbation [6,7] and voice conversion [8]), highlevel speech representations (e.g., bottleneck [9] and autoencoder bottleneck [10] features) and multi-modal representations † Equal contribution. Supported by EPSRC Project EP/R012180/1 (SpeechWave).…”
Section: Introductionmentioning
confidence: 99%