Prosody is a general term for the following features in speech: pitch and intonation, stress, articulation rate, sound intensity and time structure (rhythm and pauses). During verbal communication various prosodic forms contribute to the expression of the content of the message (the information carried by the text, emotional expression, to imitate a situation etc.). So, prosody can be represented as a multivariable function in which the number of variables is rather high. Therefore it is difficult to describe the complex process for all situations, meanings, and emotions. In this paper we try to give a phonetic level characterization of pitch and intonation structure and also the function of intensity in time of the main Hungarian sentence types (using a unified description). The manner of description is new concerning Hungarian. It is based on a unified relative scale in which not physical values but relative distances in pitch values and intensity are used to characterize the melody forms and the intensity levels. This description allows for the representation of these two prosodic elements independently of the personal features (mean F0 value, the range of the F0 of the speaker, etc.). The representation makes it possible to express the crossfunctions among the melody forms of different expressions. This means that complete prosodic patterns can be predicted for any text without an acoustic analysis.
½º ÁÒØÖÓ Ù Ø ÓÒExamination of the prosodic structure (mainly intonation patterns) of continuous speech has become more and more important in the last decade while the fields of applications of automatic speech generation have grown drastically due to the industrialization of information technology. In these applications increasingly better speech quality is needed in text reading (continuous news reader, e-mail reading, various talking services like book reviews, weather forecast, prose reading, etc.), and also in services where automatic dialogues are realized between the machine and the client. A number of models have been constructed in the last decade to describe the inherent structure of intonation-e.g., for Dutch (Collier 1990;Terken-Collier 1990) for German *
Abstract. This paper gives an overview of the design and development of an experimental restricted domain corpus-based unit selection text-tospeech (TTS) system for Hungarian. The experimental system generates weather forecasts in Hungarian. 5260 sentences were recorded creating a speech corpus containing 11 hours of continuous speech. A Hungarian speech recognizer was applied to label speech sound boundaries. Word boundaries were also marked automatically. The unit selection follows a top-down hierarchical scheme using words and speech sounds as units. A simple prosody model is used, based on the relative position of words within a prosodic phrase. The quality of the system was compared to two earlier Hungarian TTS systems. A subjective listening test was performed by 221 listeners. The experimental system scored 3.92 on a fivepoint mean opinion score (MOS) scale. The earlier unit concatenation TTS system scored 2.63, the formant synthesizer scored 1.24, and natural speech scored 4.86.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.