Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1712
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion

Abstract: In the non-parallel Voice Conversion (VC) with the Iterative combination of Nearest Neighbor search step and Conversion step Alignment (INCA) algorithm, the occurrence of one-tomany and many-to-one pairs in the training data will deteriorate the performance of the stand-alone VC system. The work on handling these pairs during the training is less explored. In this paper, we establish the relationship via intermediate speaker-independent posteriorgram representation, instead of directly mapping the source spect… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(2 citation statements)
references
References 33 publications
0
2
0
Order By: Relevance
“…The standard approach for non-parallel training in the domain of voice conversion is the INCA algorithm [16] (see Section 3.2.2) that iteratively finds alignments between individual frames in the source and target styles. Variants of the basic INCA may use several subsequent frames [17], dynamic features [18], or custom distance metrics [19] in the INCA alignment process. As a recent alternative to INCA, Cycle-consistent adversarial networks (CycleGANs, [20]) have shown promise in the domain of voice conversion.…”
Section: Introductionmentioning
confidence: 99%
“…The standard approach for non-parallel training in the domain of voice conversion is the INCA algorithm [16] (see Section 3.2.2) that iteratively finds alignments between individual frames in the source and target styles. Variants of the basic INCA may use several subsequent frames [17], dynamic features [18], or custom distance metrics [19] in the INCA alignment process. As a recent alternative to INCA, Cycle-consistent adversarial networks (CycleGANs, [20]) have shown promise in the domain of voice conversion.…”
Section: Introductionmentioning
confidence: 99%
“…Speaker dependent speech is then synthesized with an average model and a speaker specific warping over the whole phrase. In [9] two DNNs are trained to model the VTLN and reverse VTLN step for each speaker. As the normalized features are unknown, the authors propose an iterative unsupervised algorithm: 1.…”
Section: Introductionmentioning
confidence: 99%