2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS) 2016
DOI: 10.1109/icis.2016.7550889
|View full text |Cite
|
Sign up to set email alerts
|

Emotional voice conversion using deep neural networks with MCC and F0 features

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
37
0
1

Year Published

2017
2017
2022
2022

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 41 publications
(38 citation statements)
references
References 13 publications
0
37
0
1
Order By: Relevance
“…RMSE is an established metric used [33], [34], [37], [49], [50], [52], [57] to evaluate the proximity of predicted values by a mapping algorithm to those of the target, given as…”
Section: ) Root Mean Square Errormentioning
confidence: 99%
See 2 more Smart Citations
“…RMSE is an established metric used [33], [34], [37], [49], [50], [52], [57] to evaluate the proximity of predicted values by a mapping algorithm to those of the target, given as…”
Section: ) Root Mean Square Errormentioning
confidence: 99%
“…Spectral modelling uses mostly non-linear MGCEPs. Recent advances include unsupervised training with conditional restricted Boltzmann machine (CRBM) [36], pre-training using deep belief networks (DBNs) [34], modelling the spectrum and prosody simultaneously via bidirectional long short-term memory (LSTM) [37], end-to-end emotional-speech synthesis using Tacotron [38], among others.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…๊ทธ๋ฆฌํ•˜์—ฌ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์ด SPSS ๋ฐฉ์‹์—์„œ ์–ธ์–ด ํŠน์ง•๊ณผ ์Œํ–ฅ ํŠน์ง• ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ ์Œํ–ฅ ๋ชจ๋ธ๋ง์˜ ๋Œ€์•ˆ์œผ๋กœ ๋ถ€์ƒํ•˜๊ณ  ์žˆ๋‹ค (Weijters & Thole, 1993;Zen et al, 2013). ๋˜ํ•œ, ์Œ์„ฑํ•ฉ์„ฑ์— ํ•„์š”ํ•œ ์ง€์†์‹œ๊ฐ„์„ ์˜ˆ ์ธกํ•˜๊ธฐ ์œ„ํ•œ ๋ชจ๋ธ๋ง ๋ฐฉ๋ฒ•์œผ๋กœ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค (Riedi, 1995 (Kubichek, 1993;Luo et al, 2016).…”
Section: ์„œ๋ก unclassified
“…The LG-based method is insufficient to convert the prosody effectively owing to constraints of their linear models and low-dimensional F0 features. In our earlier work [16], we proposed a new NN-based method that can train the segmental F0 features for emotional prosody conversion. Although we conducted segmental processing to increase the dimensions of F0 features that can be trained by the NNs well, the segmental F0 features cannot model F0 in different temporal scales.…”
mentioning
confidence: 99%