Control of prosodic focus in corpus-based generation of fundamental frequency contours of Japanese based on the generation process model

Ochi, Kensuke; Hirose, Keikichi; Minematsu, Nobuaki

doi:10.1109/icassp.2009.4960569

Cited by 13 publications

(7 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For instance, we have developed a corpus-based method to predict differences in F 0 model commands between two versions of utterances of the same linguistic content [17,18]. Applying the predicted differences to the baseline version of speech, the new version of speech can be realized.…”

Section: Discussionmentioning

confidence: 99%

Applying generation process model constraint to fundamental frequency contours generated by hidden-Markov-model-based speech synthesis

Matsuda

Hirose

Minematsu

2012

Acoust. Sci. & Tech.

Self Cite

View full text Add to dashboard Cite

Speech synthesis based on hidden Markov models (HMMs) processes both segmental and prosodic features of speech together in a frame-by-frame manner. One benefit of this method is that time alignment of both features is kept automatically. However, when the training data are limited, frame-by-frame representation is not appropriate for prosodic features, which tightly related to speech units spreading a wide time span, such as words, phrases and so on. This causes an inherit problem in fundamental frequency (F 0 ) contour generation by HMM-based speech synthesis. A method is developed to modify F 0 contours in the framework of generation process model (henceforth, F 0 model) by referring to linguistic information of input text (word boundary and accent type). It takes F 0 variances obtained through HMM-based speech synthesis into account during the process. Through a listening experiment on synthetic speech, the method is proved to generate better quality as compared to the HMM-based speech synthesis on average. Since the F 0 model can clearly relate its commands and linguistic (and para-/non-linguistic) information, the method has an additional advantage; changing speech styles, and/or adding further information (such as emphasis) can be easily done through manipulating the commands.

show abstract

Section: Discussionmentioning

confidence: 99%

Applying generation process model constraint to fundamental frequency contours generated by hidden-Markov-model-based speech synthesis

Matsuda

Hirose

Minematsu

2012

Acoust. Sci. & Tech.

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the proposed method, generated F0 contours are represented as the sum of three contours, two of which are generated from HMM's trained using the phrase and accent components of the F0 model, and one from HMM's trained using F0 residuals. The extraction of F0 model commands is considered to be easy for the former two contours, leading to a flexible and systematic control of prosody [22,23].…”

Section: Hmm-based Speech Synthesismentioning

confidence: 99%

Speech Prosody in Phonetics and Technology

Hirose¹

2016

International Symposium on Applied Phonetics (ISAPh 2016)

Self Cite

View full text Add to dashboard Cite

As features unique to spoken language, speech prosody plays an important role in human communication. Although the acoustic features of speech are viewed most frequently in a frame-byframe manner, this is not always appropriate for prosodic features, since they are tightly related to higher level linguistic information, such as syntactic and discourse structures, and spread to wide time spans, such as syllables, words, and phrases. In order to handle the situation, models for prosody have been developed. Among many models, the generation process model of fundamental frequency contours is attractive, since it can relate well to the linguistic information of utterances. The model was successfully applied to hidden Markov model (HMM) based speech synthesis and a listening test to determine the (perceptual) categorical boundaries of Japanese accent types.

show abstract

“…The Fujisaki model (Fujisaki, 1983) is another wellknown prosodic model in ESS (Chen et al, 2004;Kiriyama et al, 2002). Ochi et al (2009) used this model to control focus by modifying the Fujisaki model parameters. Prosodic variation caused by focus was investigated by considering the difference between utterances with and without focus.…”

Section: Introductionmentioning

confidence: 99%

Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech

Tao

Hirose

et al. 2015

Speech Communication

Self Cite

View full text Add to dashboard Cite

Control of prosodic focus in corpus-based generation of fundamental frequency contours of Japanese based on the generation process model

Cited by 13 publications

References 3 publications

Applying generation process model constraint to fundamental frequency contours generated by hidden-Markov-model-based speech synthesis

Applying generation process model constraint to fundamental frequency contours generated by hidden-Markov-model-based speech synthesis

Speech Prosody in Phonetics and Technology

Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech

Contact Info

Product

Resources

About