ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing
DOI: 10.1109/icassp.1986.1168659
|View full text |Cite
|
Sign up to set email alerts
|

Acoustic characteristics and the underlying rules of intonation of the common Japanese used by radio and television announcers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 1 publication
0
5
0
Order By: Relevance
“…Normalizations of these parameters are defined by where , , and are the initial type, final type, and tone type of the th syllable, respectively. Here, and are the mean and variance of the parameter , and they are given by (12) and (13) for the pitch contour for syllables belonging to the th tone type; (14) and (15) for the energy level for syllables belonging to the th final type; (16) and (17) and 2, for the initial duration and the preceding pause duration for syllables belonging to the th initial type; and (18) and (19) for the final duration for syllables belonging to the th final type. It is noted that the scaling factor of in (10) and (11) is used to make certain the three output prosodic parameter groups have approximately equal contributions to the objective function of the EBP training algorithm.…”
Section: The Proposed Rnn-based Prosodic Information Synthesizermentioning
confidence: 99%
See 1 more Smart Citation
“…Normalizations of these parameters are defined by where , , and are the initial type, final type, and tone type of the th syllable, respectively. Here, and are the mean and variance of the parameter , and they are given by (12) and (13) for the pitch contour for syllables belonging to the th tone type; (14) and (15) for the energy level for syllables belonging to the th final type; (16) and (17) and 2, for the initial duration and the preceding pause duration for syllables belonging to the th initial type; and (18) and (19) for the final duration for syllables belonging to the th final type. It is noted that the scaling factor of in (10) and (11) is used to make certain the three output prosodic parameter groups have approximately equal contributions to the objective function of the EBP training algorithm.…”
Section: The Proposed Rnn-based Prosodic Information Synthesizermentioning
confidence: 99%
“…Although many methods for TTS prosody generation have previously been proposed for various languages [7]- [9], [18], [25], [27], [38]- [41], it is still generally difficult to elegantly invoke high-level linguistic features in exploring the prosodic phrase structure of a spoken language for prosodic information generation. The resulting synthesized prosodic parameters are therefore inadequate for generating natural, fluent and unrestricted synthetic speeches.…”
Section: Introductionmentioning
confidence: 99%
“…Both linguistic prediction and acoustic prediction modules, by assuming there exists a unique and optimal prosody realization for any input sentence, predict the prosody patterns or acoustic targets deterministically. Various machine learning algorithms have been employed to predict the most likely phonological representation for a given text stream (Wang and Hirschberg, 1991;Wightman and Ostendorf, 1994;Taylor and Black, 1998;Hirschberg and Prieto, 1996;Ostendorf and Veilleux, 1994;Chu and Qian, 2001), or to predict the most likely acoustic representation from a given phonological representation (Fujisaki et al, 1986;Ross and Ostendorf, 1999;Chen et al, 1998). In such models, minimum squared error (MSE) is often used as the objective measure.…”
Section: Introductionmentioning
confidence: 97%
“…Synthesis of fundamental frequency is an important part of a text-to-speech system. A general approach to the synthesis of fundamental frequency is to invoke some phonological rules for synthesis, which emulate the pronunciation rules of human beings (Klatt, 1987;O'Shaughnessy, 1987;Willems et al, 1988;Fujisaki et al, 1986;Fujisaki and Kawai, 1988; Lee et al, 1989). Figure 1 shows a schematic diagram of this approach.…”
Section: Introductionmentioning
confidence: 99%
“…Traditionally, phonological rules are inferred from observing a large set of utterances with the help of linguists. Due to the fact that it is, in general, difficult to explore the effect of mutual interaction of linguistic features on different levels, most F0-by-rule algorithms consider the influence of each feature independently and then add them up ('t Hart and Cohen, 1973;Olive and Nakatani, 1974;Olive, 1975;Fujisaki et al, 1986;Lee et al, 1989). For example, in one approach to synthesizing English intonation ('t Hart and Cohen, 1973), two falling declination lines gradually converging toward the end of each sentential utterance were chosen to bound the F0 contour.…”
Section: Introductionmentioning
confidence: 99%