Text to visual synthesis with appearance models

Melenchón, Javier; Torre, Fernando De la; Iriondo, Ignasi; Álías, Francesc; Martínez, Elisa; Vicent, L.

doi:10.1109/icip.2003.1246942

Cited by 4 publications

(14 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The present work is our first approach to automatic emotional speech synthesis in Catalan with the purpose of including emotional expressivity in the output channel of an HCI system [7] [8]. Catalan is the native language of Catalonia, the Valencian Country and the Balearic Islands (central east and north east part of Spain), which is spoken by more than 6 million people.…”

Section: Introductionmentioning

confidence: 99%

Modeling and Synthesizing Emotional Speech for Catalan Text-to-Speech Synthesis

Iriondo¹,

Álías²,

Melenchón³

et al. 2004

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. This paper describes an initial approach to emotional speech synthesis in Catalan based on a diphone concatenation TTS system. The main goal of this work is to develop a simple prosodic model for expressive synthesis. This model is obtained from an emotional speech collection artificially generated by means of a copy-prosody experiment. After validating the emotional content of this collection, the model was automated and incorporated into our TTS system. Finally, the automatic speech synthesis system has been evaluated by means of a perceptual test, obtaining encouraging results.

show abstract

Section: Introductionmentioning

confidence: 99%

Modeling and Synthesizing Emotional Speech for Catalan Text-to-Speech Synthesis

Iriondo¹,

Álías²,

Melenchón³

et al. 2004

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…The short sequence consists of 316 images and it has been used to compare the results obtained from our On-the-fly Training Algorithm and its previous non-causal version [7]. Achieving the same quality in the results (see Fig.…”

Section: On-the-fly Training Algorithmmentioning

confidence: 85%

“…First of all, the four masks π r are manually extracted from the first image the corresponding alignment coefficients a 1 are set to 0; they represent the affine transformation used to fit the masks onto the face on each frame [5]. Using the tracking algorithm presented in [7] (Figure 3) can be executed. Besides, only those columns of U r t+1 and V r t+1 whose values of Σ r t+1 exceed a threshold τ are considered, keeping only those eigenvectors with enough information.…”

Section: Training Processmentioning

confidence: 99%

“…Some non-intrusive visual trackers can be used in this sheme because they retain information regarding to position, scale, orientation and appearance of the tracked element, e.g. [5], [6], [7], [8] and [9]. Nevertheless, the whole sequence is needed by these algorithms to be processed off-line (they have a non-causal behaviour); as a result, a real time implementation of these methods is impossible, even without considering their computational cost.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Simultaneous and Causal Appearance Learning and Tracking

Melenchón¹,

Iriondo²,

Meler³

2009

Series in Machine Perception and Artificial Intelligence

Self Cite

View full text Add to dashboard Cite

A novel way to learn and track simultaneously the appearance of a previously non-seen face without intrusive techniques can be found in this article. The presented approach has a causal behaviour: no future frames are needed to process the current ones. The model used in the tracking process is refined with each input frame thanks to a new algorithm for the simultaneous and incremental computation of the singular value decomposition (SVD) and the mean of the data. Previously developed methods about iterative computation of SVD are taken into account and an original way to extract the mean information from the reduced SVD of a matrix is also considered. Furthermore, the results are produced with linear computational cost and sublinear memory requirements with respect to the size of the data. Finally, experimental results are included, showing the tracking performance and some comparisons between the batch and our incremental computation of the SVD with mean information.

show abstract

“…The visual information is extracted from the recorded image sequence using the registration algorithm presented in [18]. This algorithm takes as input the recorded image sequence and a set of masks and returns a set of orthonormal bases B (PSFAM) and a matrix of coefficients C with columns c i .…”

Section: Visual Informationmentioning

confidence: 99%

Lip animation of a personalized facial model from auditory speech

Melenchón

Iriondo

Socoró

et al.

Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795)

Self Cite

View full text Add to dashboard Cite

This paper proposes a new method for lip animation of personalized facial model from auditory speech. It is based on Bayesian estimation and person specific appearance models (PSFAM). Initially, a video of a speaking person is recorded from which the visual and acoustic features of the speaker and their relationship will be learnt. First, the visual information of the speaker is stored in a color PSFAM by means of a registration algorithm. Second, the auditory features are extracted from the waveform attached to the recorded video sequence. Third, the relationship between the learnt PSFAM and the auditory features of the speaker is represented by Bayesian estimators. Finally, subjective perceptual tests are reported in order to measure the intelligibility of the preliminary results synthesizing isolated words.

show abstract

Text to visual synthesis with appearance models

Cited by 4 publications

References 8 publications

Modeling and Synthesizing Emotional Speech for Catalan Text-to-Speech Synthesis

Modeling and Synthesizing Emotional Speech for Catalan Text-to-Speech Synthesis

Simultaneous and Causal Appearance Learning and Tracking

Lip animation of a personalized facial model from auditory speech

Contact Info

Product

Resources

About