2021
DOI: 10.1109/taslp.2020.3038524
|View full text |Cite
|
Sign up to set email alerts
|

An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
104
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
2

Relationship

2
7

Authors

Journals

citations
Cited by 224 publications
(105 citation statements)
references
References 224 publications
(294 reference statements)
0
104
0
1
Order By: Relevance
“…We conduct objective evaluation to assess the performance of our proposed model. We calculate Mel-cepstral distortion (MCD) [7,4] to measure the spectral distortion between the converted and reference Mel-spectrum for two male and two female speakers for three emotion combinations.…”
Section: Objective Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…We conduct objective evaluation to assess the performance of our proposed model. We calculate Mel-cepstral distortion (MCD) [7,4] to measure the spectral distortion between the converted and reference Mel-spectrum for two male and two female speakers for three emotion combinations.…”
Section: Objective Evaluationmentioning
confidence: 99%
“…Emotional voice conversion and speech voice conversion [4] differs in many ways. Speech voice conversion aims to change the speaker identity, whereas emotional voice conversion focuses on the emotional state transfer.…”
Section: Introductionmentioning
confidence: 99%
“…In a recent VC review paper [48], it was shown that a sufficient amount of efforts has been dedicated to transferring knowledge from ASR and TTS to improving various aspects of VC, regardless of using a seq2seq model or not. The PPGbased methods [49]- [53] and the Parratron system described in Section II-A facilitated nonparallel, any-to-one VC by utilizing ASR and TTS modules, respectively.…”
Section: Transfer Learning From Asr and Tts For Vcmentioning
confidence: 99%
“…Aging upward or downward, or changing the perception of one’s gender, can similarly allow people to influence how they are received by their counterparts. Using deepfakes, it is possible to change a person’s identifying attributes such as skin tone, hair color, gender (Lu, Tai, and Tang 2018), age (Antipov, Baccouche, and Dugelay 2017), accent, and speech pattern (Sisman et al 2020). Many of these deepfakes can already be generated on the fly, and it is only a matter of time before all such conversions are possible in real time.…”
Section: Deepfakes: Covert Changes In Audio‐visual Cuesmentioning
confidence: 99%