Modeling Improved Prosody Generation from High-Level Linguistically Annotated Corpora

Xydas, Gerasimos

doi:10.1093/ietisy/e88-d.3.510

Cited by 13 publications

(8 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Further improvements in the tree-structured predictors can be achieved by the introduction of more delicate linguistic features as has been inspected on [16] for the CART approach.…”

Section: Discussionmentioning

confidence: 99%

Experimental Evaluation of Tree-Based Algorithms for Intonational Breaks Representation

Zervas

Xydas

Fakotakis

et al. 2005

Text, Speech and Dialogue

View full text Add to dashboard Cite

Abstract. The prosodic specification of an utterance to be spoken by a Textto-Speech synthesis system can be devised in break indices, pitch accents and boundary tones. In particular, the identification of break indices formulates the intonational phrase breaks that affect all the forthcoming prosody-related procedures. In the present paper we use tree-structured predictors, and specifically the commonly used in similar tasks CART and the introduced C4.5 one, to cope with the task of break placement in the presence of shallow textual features. We have utilized two 500-utterance prosodic corpora offered by two Greek universities in order to compare the machine learning approaches and to argue on the robustness they offer for Greek break modeling. The evaluation of the resulted models revealed that both approaches were positively compared with similar works published for other languages, while the C4.5 method accuracy scaled from 1% to 2,7% better than CART.

show abstract

“…Further improvements in the tree-structured predictors can be achieved by the introduction of more delicate linguistic features as has been inspected on [16] for the CART approach.…”

Section: Discussionmentioning

confidence: 99%

Experimental Evaluation of Tree-Based Algorithms for Intonational Breaks Representation

Zervas

Xydas

Fakotakis

et al. 2005

Text, Speech and Dialogue

View full text Add to dashboard Cite

show abstract

“…The audio recordings were digitized through the audio card. DEMOSTHe´NES is a modular and scalable (Xydas & Kouroupetroglou, 2001a), multilingual and polyglot (Xydas & Kouroupetroglou, 2001b) TTS system that supports Greek and English with various voices and incorporates advanced speech synthesis methodologies in order to produce almost natural pitch and prosody (Xydas, Spiliotopoulos, & Kouroupetroglou, 2005;Xydas, Zervas, Kouroupetroglou, Fakotakis, & Kokkinakis, 2005). The DEMOSTHe´NES TTS system used in this study incorporates a diphone-based MBROLA (Multi-Band Resynthesis OverLap and Process) synthesis technique (Dutoit, Pagel, Pierret, Bataille, & van Der Vreken, 1996), with the female voice and the following parameters: pitch ¼ 210Hz; speed ¼ 95 wpm.…”

Section: Methodsmentioning

confidence: 99%

Differences Among Sighted Individuals and Individuals with Visual Impairments in Word Intelligibility Presented via Synthetic and Natural Speech

Papadopoulos

Katemidou

Koutsoklenis

et al. 2010

Augmentative and Alternative Communication

View full text Add to dashboard Cite

This study investigated word intelligibility among sighted individuals and individuals with visual impairments for both natural and synthetic speech. Both groups of participants performed significantly better when identifying words presented via natural speech. The results also demonstrated that individuals with visual impairments were more successful than their sighted peers in understanding words presented via synthetic speech, with experience being the most critical factor in identifying words for the participants with visual impairments. Finally, the findings show the correlation between intelligibility and key factors such as age and the overall use of text-to-speech systems.

show abstract

“…The labelling of the intonational phenomena had been conducted main ly by listening to the recorded utterance in conjunction to observation of amplitude and pitch contour of the speech signal. The annotator"s transcription consistency was further evaluated by cross checking statistically our data with a prosodic corpus constructed at the University of Athens for speech synthesis purposes [32].…”

Section: The Grtobi Prosody Annotation Systemmentioning

confidence: 99%

Prosodic Boundary Prediction for Greek Speech Synthesis

Zervas¹

2013

JCSA

View full text Add to dashboard Cite

In this article, we evaluate features and algorithms for the task of prosodic boundary prediction for Greek. For this purpose a prosodic corpus composed of generic domain text was constructed. Feature contribution was evaluated and ranked with the application of information gain ranking and correlation-based feature selection filtering methods. Resulted datasets were applied to C4.5 decision tree, one-neighbour instance based learner and Bayesian learning methods. Models performance exploitation led as to the construction of a practically optimal feature set whose prediction effectiveness was evaluated with two prosodic databases. In terms of total accuracy and F-measure, evaluation results established the decision tree effectiveness in learning rules for prosodic boundary prediction.

show abstract

Modeling Improved Prosody Generation from High-Level Linguistically Annotated Corpora

Cited by 13 publications

References 18 publications

Experimental Evaluation of Tree-Based Algorithms for Intonational Breaks Representation

Experimental Evaluation of Tree-Based Algorithms for Intonational Breaks Representation

Differences Among Sighted Individuals and Individuals with Visual Impairments in Word Intelligibility Presented via Synthetic and Natural Speech

Prosodic Boundary Prediction for Greek Speech Synthesis

Contact Info

Product

Resources

About