2017
DOI: 10.1016/j.specom.2017.01.002
|View full text |Cite
|
Sign up to set email alerts
|

Dimensional paralinguistic information control based on multiple-regression HSMM for spontaneous dialogue speech synthesis with robust parameter estimation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 14 publications
0
10
0
Order By: Relevance
“…Controllability of paralinguistic information with the proposed deep neural network (DNN) was investigated by comparison with the method based on the MAP-MRHSMM [4]. Figure 2 shows the change in the distribution of averaged fundamental frequency of synthesized utterances, with given 2-dimensional values of paralinguistic information.…”
Section: Analysis Of Synthesized Speechmentioning
confidence: 99%
See 1 more Smart Citation
“…Controllability of paralinguistic information with the proposed deep neural network (DNN) was investigated by comparison with the method based on the MAP-MRHSMM [4]. Figure 2 shows the change in the distribution of averaged fundamental frequency of synthesized utterances, with given 2-dimensional values of paralinguistic information.…”
Section: Analysis Of Synthesized Speechmentioning
confidence: 99%
“…Previously, we studied a dialogue speech synthesis based on multiple-regression hidden semi-Markov model (MRHSMM) [4]. Although the MRHSMM enabled to control paralinguistic information in the form of dimensions such as pleasantunpleasant, aroused-sleepy, etc., synthesized speech tended to have extreme parameters due to badly estimated regression matrices.We have shown that MAP estimation of regression matrices was effective to reduce the overfitting problem [4]. How-ever, the problem still remains for certain combinations of given input as paralinguistic information.…”
Section: Introductionmentioning
confidence: 99%
“…While there has been some past work on building speech synthesisers from spontaneous speech audio [3,4,5,6], this was restricted to small, hand-annotated corpora and statistical parametric speech synthesisers. Bigger corpora can usually be sourced from found speech recordings, but the output quality has been disappointing, e.g., [7].…”
Section: Introductionmentioning
confidence: 99%
“…The most commonly used evaluation protocols for TTS are listening tests with respect to quality, naturalness, intelligibility, similarity and expressiveness [7,8,9,10]. Applicationdependent measures are also used, such as those for audiobook reading [11] and spoken dialogue systems or human-robot interaction [12,13,14,15,16,17,18].…”
Section: Introductionmentioning
confidence: 99%
“…For example, interpretation about what is and what is not "CRITICISM" can be inconsistent among participants and researchers. This problem may not sound serious when a limited number of categories [4,5] or low-dimensional representation [10] are of interest. However, it becomes severe when using larger number of intentions for evaluation because it will be harder to precisely comprehend their difference.…”
Section: Introductionmentioning
confidence: 99%