2009
DOI: 10.1109/tasl.2008.2010213
|View full text |Cite
|
Sign up to set email alerts
|

Emphatic Visual Speech Synthesis

Abstract: Abstract-The synthesis of talking heads has been a flourishing research area over the last few years. Since human beings have an uncanny ability to read people's faces, most related applications (e.g., advertising, video-teleconferencing) require absolutely realistic photometric and behavioral synthesis of faces. This paper proposes a person-specific facial synthesis framework that allows high realism and includes a novel way to control visual emphasis (e.g., level of exaggeration of visible articulatory movem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2010
2010
2017
2017

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 47 publications
(62 reference statements)
0
4
0
Order By: Relevance
“…The suggested structure of corpus of Castilian Spanish used by Melenchón, Martínez, De La Torre, and Montero (2009) consisted of /CVCV/. Their purpose for using such structure was supported by a strong statement, which claims more than 80% of Castilian Spanish words flow /CV/ structure (de Vega, Álvarez, & Carreiras, 1992).…”
Section: Corpus Designmentioning
confidence: 98%
“…The suggested structure of corpus of Castilian Spanish used by Melenchón, Martínez, De La Torre, and Montero (2009) consisted of /CVCV/. Their purpose for using such structure was supported by a strong statement, which claims more than 80% of Castilian Spanish words flow /CV/ structure (de Vega, Álvarez, & Carreiras, 1992).…”
Section: Corpus Designmentioning
confidence: 98%
“…The required input information might be derived from an automatic speech recognition (ASR) system if the corresponding acoustic speech is available, or existing (acoustic) text-to-speech synthesis rules can generate the required phoneme and timing sequence. Trajectory formation models have included concatenation [22], [24], [28]- [30], [43]- [46]; interpolation [3], [25], [26], [47]- [49]; probabilistic approaches [20], [23], [50]; and hybrid approaches [27], [51].…”
Section: Related Workmentioning
confidence: 99%
“…Multimodal affective recognition and synthesis deal with the determination and the simulation of multimodal expressiveness, respectively [1]. In MMHCI, the latter is typically conducted by talking-heads, whose research is mainly focused on the generation of realistic-looking (i.e., human-like) affective avatars [2]- [4]. Talking-heads may be used as a front-end in multimedia applications such as virtual operators, help desks, education tutors, etc.…”
Section: Introductionmentioning
confidence: 99%
“…Unit-selection text-to-speech (US-TTS) synthesis [6], which is based on the selection and concatenation of prerecorded speech units coming from a large speech database, is one of the dominant speech synthesis techniques [7]. Although there are sev- eral talking-heads including US-TTS (e.g., [4], [5]), there is still no significant research on including large affective speech corpora for the generation of their synthetic speech (e.g., affective speech is obtained from a diphone TTS by prosodic transformation rules [3] or through interactive control from only 1 h speech corpus containing read text [2]). One of the main reasons to this fact is the difficulty of obtaining accurate and reliable labels when dealing with large speech corpora, which become crucial to achieve high-quality synthetic speech [8], [9].…”
Section: Introductionmentioning
confidence: 99%