A speaker structures utterances very clearly by grouping words into phrases. This facilitates the listener’s recovery of the meaning of the utterance and the speaker’s intention. To this purpose, a speaker uses, among other things, suprasegmental cues, such as intonation, pauses and prefinal lengthening of speech sounds. The research described here is concerned with the relationship between the strength of prosodic boundaries in spoken utterances as perceived by untrained listeners (perceptual boundary strength, PBS) and the phonetic cues melodic discontinuity, pause, declination reset and, to a limited extent, prefinal lengthening. The results indicate that untrained listeners can give reliable and usable judgments of PBS and that this is true even if the lexical contents of the utterances is made unrecognizable, thus blocking access to lexical, syntactic, and semantic information. There is a clear relation between PBS and the phonetic cues, the general trend being for PBS values to increase as more phonetic cues are associated with a given word boundary. The experimentally obtained PBS values were also compared with boundaries predicted on the basis of a syntactic and metrical analysis of the material. A high agreement was found between the PBS values found and the theoretically predicted prosodic structure.
We present a data-to-speech system called D2S, which can be used for the creation of datato-speech systems in different languages and domains. The most important characteristic of a data-to-speech system is that it combines language and speech generation: language generation is used to produce a natural language text expressing the system's input data, and speech generation is used to make this text audible. In D2S, this combination is exploited by using linguistic information available in the language generation module for the computation of prosody. This allows us to achieve a better prosodic output quality than can be achieved in a plain text-to-speech system. For language generation in D2S, the use of syntactically enriched templates is guided by knowledge of the discourse context, while for speech generation pre-recorded phrases are combined in a prosodically sophisticated manner. This combination of techniques makes it possible to create linguistically sound but efficient systems with a high quality language and speech output.
This paper addresses two main questions: (a) Can listeners assign values of perceived boundary strength to the juncture between any two words? (b) If so, what is the relationship between these values and various (combinations of) suprasegmental features. Three speakers read a set of twenty utterances of varying length and complexity. A panel of nineteen listeners assigned boundary strength values to each of the 175 word boundaries in the material. Then the correlation was established between the variable strength of the perceived boundaries and three prosodic variables: melodic discontinuity, declination reset and pause. The results show that speakers may differ in their strategies of prosodic boundary marking and listeners agree in the perceptual weight they attribute to the prosodic cues.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.