2012
DOI: 10.1109/lsp.2012.2225615
|View full text |Cite
|
Sign up to set email alerts
|

Mixture of Factor Analyzers Using Priors From Non-Parallel Speech for Voice Conversion

Abstract: Abstract-A robust voice conversion function relies on a large amount of parallel training data, which is difficult to collect in practice. To tackle the sparse parallel training data problem in voice conversion, this paper describes a mixture of factor analyzers method which integrates prior knowledge from nonparallel speech into the training of conversion function. The experiments on CMU ARCTIC corpus show that the proposed method improves the quality and similarity of converted speech. With both objective an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
27
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
3
2
1

Relationship

4
2

Authors

Journals

citations
Cited by 24 publications
(27 citation statements)
references
References 14 publications
0
27
0
Order By: Relevance
“…Therefore, in the early 2010, several alternative linear transformation methods were developed. Examples are partial least square (PLS) regression [85], tensor representation [87], a trajectory HMM [88], mixture of factor analysers [89], local linear transformation [82] or noisy channel models [90].…”
Section: Voice Conversionmentioning
confidence: 99%
“…Therefore, in the early 2010, several alternative linear transformation methods were developed. Examples are partial least square (PLS) regression [85], tensor representation [87], a trajectory HMM [88], mixture of factor analysers [89], local linear transformation [82] or noisy channel models [90].…”
Section: Voice Conversionmentioning
confidence: 99%
“…It achieves smooth feature transformations using a local linear transformation. Despite its popularity, known problems of JD-GMM include over-smoothing [73][74][75] and over-fitting [76,77] which has led to the development of alternative linear conversion methods such as partial least square (PLS) regression [76], tensor representation [78], a trajectory hidden Markov model [79], a mixture of factor analysers [80], local linear transformation [73] and a noisy channel model [81]. Non-linear approaches, including artificial neural networks [82,83], support vector regression [84], kernel partial least square [85] and conditional restricted Boltzmann machines [86], have also been studied.…”
Section: Voice Conversionmentioning
confidence: 99%
“…For example, joint density Gaussian mixture model (JD-GMM) [1], partial least squares regression [2], mixture of factor analyzers [3] and local linear transformation [4] methods try to build a local linear transformation function. In addition, methods, such as neural network [5,6] and dynamic kernel partial least squares regression [7] have also been proposed to learn the non-linear relationship between source and target speech.…”
Section: Introductionmentioning
confidence: 99%
“…Although JD-GMM based methods can effective transform the source speech feature vectors into target speech feature space and generate converted speech with acceptable quality, the over-smoothing and over-fitting problems have been reported in [10,2,3,4]. Over-smoothing is due to the statistical average during training the mean vectors and covariance matrices of the Gaussian components [10] (e. g., each mean vector is a weighted summation of all the training vectors).…”
Section: Introductionmentioning
confidence: 99%