Augmented Auditory Representation of e-Texts for Text-to-Speech Systems

Xydas, Gerasimos

doi:10.1007/3-540-44805-5_17

Cited by 17 publications

(19 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In Figure 8 a proposed real-time system that automatically produces emotional annotation to documents and conveys the visual elements into acoustic modality using expressive speech synthesis is presented. Future work includes the use of an e-TSA composer (Xydas & Kouroupetroglou, 2001a, 2001b (Xydas et al, 2005) on the DEMOSTHeNES Text-to-Speech platform (Xydas & Kouroupetroglou, 2001c) and models for expressive speech synthesis as proposed by Schröder (Schröder, 2006), for acoustic rendition of emotionally annotated documents. …”

Section: Conclusion Future Work and Potential Applicationsmentioning

confidence: 99%

A Methodology for the Extraction of Reader's Emotional State Triggered from Text Typography

Tsonos¹

2008

Tools in Artificial Intelligence

Self Cite

View full text Add to dashboard Cite

Section: Conclusion Future Work and Potential Applicationsmentioning

confidence: 99%

A Methodology for the Extraction of Reader's Emotional State Triggered from Text Typography

Tsonos¹

2008

Tools in Artificial Intelligence

Self Cite

View full text Add to dashboard Cite

“…As we described above, the book is formatted in LogicML. Following previous works on Document-to-Audio conversion [30], the semantics meta-data can be acoustically represented by specific auditory elements like (a) alternative text insertion in the document's text stream, (b) modifications in the prosody, (c) switching voices and (d) inserting other sounds like earcons and auditory icons in the waveform stream, according to the class of meta-data provided in the e-book. The user can be trained to recognize and to combine speech and sounds with specific commands and events.…”

Section: Delivering Books Into Acoustic Modalitymentioning

confidence: 99%

“…3) [29] [30]. It will be mapped in specific acoustic elements, as mentioned above, producing a new annotated document and auditory synthesizer will implement the mapping (the output files can be e.g.…”

Section: Delivering Books Into Acoustic Modalitymentioning

confidence: 99%

Auditory Accessibility of Metadata in Books: A Design for All Approach

Tsonos

Xydas

2007

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. There are two issues that are challenging in the life-cycle of Digital Talking Books (DTB): the automatic labeling of text formatting meta-data in documents and the multimodal representation of the text formatting semantics. We propose an augmented design-for-all approach for both the production and the reading processes of DAISY compliant DTBs. This approach incorporates a methodology for the real-time extraction and the semantic labeling of text formatting meta-data. Furthermore, it includes a unified approach for the multimodal rendering of text formatting, structure and layout meta-data by utilizing a Document-to-Audio platform to render the acoustic modality.

show abstract

“…Concept-to-Speech (CtS) systems (i.e. a Natural Language Generation -NLG -system coupled with a TtS system [9]) can provide linguistic information which can be used in prosody modeling [10], [11].…”

Section: Introductionmentioning

confidence: 99%

Modeling Improved Prosody Generation from High-Level Linguistically Annotated Corpora

Xydas¹

2005

IEICE Transactions on Information and Systems

View full text Add to dashboard Cite

SUMMARY Synthetic speech usually suffers from bad F0 contour surface. The prediction of the underlying pitch targets robustly relies on the quality of the predicted prosodic structures, i.e. the corresponding sequences of tones and breaks. In the present work, we have utilized a linguistically enriched annotated corpus to build data-driven models for predicting prosodic structures with increased accuracy. We have then used a linear regression approach for the F0 modeling. An appropriate XML annotation scheme has been introduced to encode syntax, grammar, new or already given information, phrase subject/object information, as well as rhetorical elements in the corpus, by exploiting a Natural Language Generator (NLG) system. To prove the benefits from the introduction of the enriched input meta-information, we first show that while tone and break CART predictors have high accuracy when standing alone (92.35% for breaks, 87.76% for accents and 99.03% for endtones), their application in the TtS chain degrades the Linear Regression pitch target model. On the other hand, the enriched linguistic meta-information minimizes errors of models leading to a more natural F0 surface. Both objective and subjective evaluation were adopted for the intonation contours by taking into account the propagated errors introduced by each model in the synthesis chain. key words : prosody modeling, text-to-speech, linguistic meta-information, synthetic prosody evaluation

show abstract

Augmented Auditory Representation of e-Texts for Text-to-Speech Systems

Cited by 17 publications

References 4 publications

A Methodology for the Extraction of Reader's Emotional State Triggered from Text Typography

A Methodology for the Extraction of Reader's Emotional State Triggered from Text Typography

Auditory Accessibility of Metadata in Books: A Design for All Approach

Modeling Improved Prosody Generation from High-Level Linguistically Annotated Corpora

Contact Info

Product

Resources

About