2012
DOI: 10.1016/j.asoc.2012.05.027
|View full text |Cite
|
Sign up to set email alerts
|

Comparing ANN and GMM in a voice conversion framework

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
14
0

Year Published

2012
2012
2015
2015

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 30 publications
(15 citation statements)
references
References 7 publications
1
14
0
Order By: Relevance
“…After parallel or non-parallel frame alignment of source and target speech data in the training corpus if x t and y t be the d dimensional source and target feature vectors at frame t respectively, the joint probability density of the augmented source and target feature vectors z t is modeled by a GMM as [32]:…”
Section: Spectral Mapping Using Gmmmentioning
confidence: 99%
See 1 more Smart Citation
“…After parallel or non-parallel frame alignment of source and target speech data in the training corpus if x t and y t be the d dimensional source and target feature vectors at frame t respectively, the joint probability density of the augmented source and target feature vectors z t is modeled by a GMM as [32]:…”
Section: Spectral Mapping Using Gmmmentioning
confidence: 99%
“…In this paper, the performance of GMM-based SID systems [2], GMM-UBM based SV systems [28] and GMM-SVM based SV systems [29] against different voice conversion spoofing attacks on the TIMIT database [30] are examined and compared for all possible conversion directions. We will use three different voice conversion techniques: standard GMM based VC method [31,32], WFW based VC method [33] and WFW − based VC method [33]. In GMM based voice conversion, mixture of Gaussian components is used to model the probability densities of joint source and target speaker's feature vectors.…”
Section: Introductionmentioning
confidence: 99%
“…General VC system design follows two simultaneous goals: firstly, obtaining desired target perception and maintaining speech intelligibility intact at the output. General architecture of VC system is divided in two distinct stages namely, (i) training stage (generally off-line) and (ii) conversion stage (off-line/online) [8], [9]. Extraction of discriminative features that wellapproximates the speaker behavior and building source-target relationship are contents of training stage.…”
Section: Introductionmentioning
confidence: 99%
“…Later on, with increased mathematical analysis researchers proposed homomorphic features like, real cepstrum, Mel Frequency Cepstrum Coefficients (MFCCs) [15] and complex cepstrum [14], [16] which does not depend on any speech model. To ease the intra speaker variations use of features like vocal tract length normalization and mel cepstral envelop is encouraged [8], [17]. Speech modeling is further simplified by introducing harmonic and noise model [18], [19], which by far is the simplest yet effective algorithm.…”
Section: Introductionmentioning
confidence: 99%
“…Finally, the performance of the proposed filter bank-based VC model is compared with the state-of-the-art multiscale voice morphing using RBF analysis. This is done using various objective measures, such as performance index (P LSF ) [1], formant deviation [7,20], and spectral distortion [20]. The commonly used subjective measures such as Mean Opinion Score (MOS) and ABX are used to verify the quality and similarity of the converted speech signal [21].…”
Section: Introductionmentioning
confidence: 99%