2015
DOI: 10.1186/s13636-015-0075-4
|View full text |Cite
|
Sign up to set email alerts
|

Small-parallel exemplar-based voice conversion in noisy environments using affine non-negative matrix factorization

Abstract: The need to have a large amount of parallel data is a large hurdle for the practical use of voice conversion (VC). This paper presents a novel framework of exemplar-based VC that only requires a small number of parallel exemplars. In our previous work, a VC technique using non-negative matrix factorization (NMF) for noisy environments was proposed. This method requires parallel exemplars (which consist of the source exemplars and target exemplars that have the same texts uttered by the source and target speake… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 17 publications
0
6
0
Order By: Relevance
“…In the non-parallel setting, the initialization is based on the NMF and NTD frameworks. This initialization method uses an adaptive matrix [42]. Finally, initialized parameters are optimized by Eqs.…”
Section: Conditionsmentioning
confidence: 99%
“…In the non-parallel setting, the initialization is based on the NMF and NTD frameworks. This initialization method uses an adaptive matrix [42]. Finally, initialized parameters are optimized by Eqs.…”
Section: Conditionsmentioning
confidence: 99%
“…Non-negative matrix factorization (NMF) [9,31,32] assumes that the speech can be expressed with exemplars and corresponding weights. NMF builds a dictionary consisting of corresponding exemplars from source speech and target speech.…”
Section: Introductionmentioning
confidence: 99%
“…Depending on whether there are same utterance pairs in the training dataset, voice conversion can be categorized into two types, a parallel one and a non-parallel one. The early studies [5,6,7,8,9] are focused on parallel voice conversion by building the spectrum mapping between the source and target speaker. Among them, the statistical parametric approaches like Gaussian mixture model (GMM) [5,6] and partial least square regression [7] use the statistical model to learn the mapping between the source and target spectrum.…”
Section: Introductionmentioning
confidence: 99%
“…However, these statistical parametric methods degrade the quality of the converted speeches due to the over-smoothing effects. Then the non-negative matrix factorization (NMF) based approaches [8,9] are proposed to address the over-smoothing effects by decomposing the spectrum into weighted linear combinations of exemplars.…”
Section: Introductionmentioning
confidence: 99%