2020
DOI: 10.1109/lsp.2019.2961213
|View full text |Cite
|
Sign up to set email alerts
|

Voice Conversion for Whispered Speech Synthesis

Abstract: We present an approach to synthesize whisper by applying a handcrafted signal processing recipe and Voice Conversion (VC) techniques to convert normally phonated speech to whispered speech. We investigate using Gaussian Mixture Models (GMM) and Deep Neural Networks (DNN) to model the mapping between acoustic features of normal speech and those of whispered speech. We evaluate naturalness and speaker similarity of the converted whisper on an internal corpus and on the publicly available wTIMIT corpus. We show t… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 27 publications
(8 citation statements)
references
References 35 publications
0
8
0
Order By: Relevance
“…Parallel WaveGAN (PWG) [20] was used as the neural vocoder. We followed an open-source implementation 3 . The training data of PWG contained the audio recordings of all control speakers in UASpeech.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Parallel WaveGAN (PWG) [20] was used as the neural vocoder. We followed an open-source implementation 3 . The training data of PWG contained the audio recordings of all control speakers in UASpeech.…”
Section: Methodsmentioning
confidence: 99%
“…Neural voice conversion (VC) has substantially improved the naturalness of synthesized speech in a wide range of tasks, including read speech [1], emotional speech [2] and whispered speech [3]. However, pathological VC (and TTS too) is a largely unexplored area, which has several interesting applications.…”
Section: Introductionmentioning
confidence: 99%
“…Voice Conversion (VC) is the task of modifying an utterance from a source speaker to make it sound like it was uttered by a target speaker, while preserving the original linguistic content [1]. VC is a key component of many modern applications, including text-to-speech (TTS) [2], speech enhancement [3], and speaking assistance [4] systems. Due to its success in these fields, VC has been studied extensively in recent years [5].…”
Section: Introductionmentioning
confidence: 99%
“…Most existing works on whisper and Lombard speech synthesis analyze and generate these speaking styles separately. Because whisper speech lacks the vibration element of voiced speech, the fundamental frequency (f0) and spectral tilt can be modified in a source-filter model to convert normal speech into whisper speech [10,11]. Other methods [12,11] learn the mapping between normal and whisper acoustic features through voice conversion (VC) techniques.…”
Section: Introductionmentioning
confidence: 99%
“…Because whisper speech lacks the vibration element of voiced speech, the fundamental frequency (f0) and spectral tilt can be modified in a source-filter model to convert normal speech into whisper speech [10,11]. Other methods [12,11] learn the mapping between normal and whisper acoustic features through voice conversion (VC) techniques. However, parallel corpora containing both normal and whisper recordings are usually needed for training a VC model.…”
Section: Introductionmentioning
confidence: 99%