1989
DOI: 10.1121/1.2027539
|View full text |Cite
|
Sign up to set email alerts
|

Combining statistical and linguistic models for synthesis of prosodic contours

Abstract: “It is very important to get the timing, intonation, and allophonic detail correct in order that a sentence sound intelligible and moderately natural.” [D. Klatt, J. Acoust. Soc. Am. 82, 737–793 (1987)]. This important review article included prosody as a research issue for improving text-to-speech synthesis. Klan's suggestions for improving prosody are addressed here: Development of new systems for control of F0 and duration, and mechanisms for adding variety. The proposed synthesis system is a statistical mo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
11
0

Year Published

2004
2004
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 9 publications
(11 citation statements)
references
References 0 publications
0
11
0
Order By: Relevance
“…Participants listened to two stories (one male, one female speaker) from the Boston University Radio Speech Corpus (for full stimulus transcripts, see Extended Data Table 1-1) (Ostendorf et al, 1995), each once at regular speech rate and once slowed to one-third speech rate. Overall, the stimuli contained 26 paragraphs (each containing 1-4 sentences) of 10-60 s duration, with silent periods of 500-1100 ms inserted between paragraphs to allow measuring onset responses in the MEG without distortion from preceding speech.…”
Section: Speech Stimulusmentioning
confidence: 99%
“…Participants listened to two stories (one male, one female speaker) from the Boston University Radio Speech Corpus (for full stimulus transcripts, see Extended Data Table 1-1) (Ostendorf et al, 1995), each once at regular speech rate and once slowed to one-third speech rate. Overall, the stimuli contained 26 paragraphs (each containing 1-4 sentences) of 10-60 s duration, with silent periods of 500-1100 ms inserted between paragraphs to allow measuring onset responses in the MEG without distortion from preceding speech.…”
Section: Speech Stimulusmentioning
confidence: 99%
“…(Arvaniti & Baltazani 2005), in contrast to the simple high phrase tone in Figure 2. In general, the complexity of the F0 movement (indicating the existence of two targets), the scaling of the high tone, and the perceived boundary strength (Ostendorf et al 1995;Wightman et al 1992;Nespor & Vogel 2007;Pierrehumbert 1980;Beckman & Pierrehumbert 1986) were the main criteria for annotating phrase versus boundary tones. Interestingly, the results further revealed that phrasing was realized differently across the contrast levels in topic constituents, where the presence and type of edge tones varied as shown in Table 2.…”
Section: Resultsmentioning
confidence: 99%
“…Note the complex fall rise movement of the contour before the boundary corresponding to an L-H% boundary tone(Arvaniti & Baltazani 2005), in contrast to the simple high phrase tone in Figure2. In general, the complexity of the F0 movement (indicating the existence of two targets), the scaling of the high tone, and the perceived boundary strength(Ostendorf et al 1995;Wightman et al 1992;Nespor & Vogel 2007;Pierrehumbert 1980;Beckman & Pierrehumbert 1986) were the main criteria for annotating phrase versus boundary tones.…”
mentioning
confidence: 99%
“…This recognizer lacks a model that imposes the high-level linguistic constraints and assumes that prosody can be determined completely from their syllabic-timed acous-tic observations and pre-compiled lexical stress information. Nevertheless, it is successful on labeling pitch accents on the Radio News Corpus [3] with 84% accuracy on accent presence/absence prediction, about 30% higher than the estimated chance level. However, it does not perform well on intonational phrase boundary (IPB) detection: IPB recognition accuracy is only 71%, 12% below the estimated chance level.…”
Section: Introductionmentioning
confidence: 93%