Region Based Adversarial Synthesis of Facial Action Units

Liu, Zhilei; Liu, Diyi; Wu, Yunpeng

doi:10.1007/978-3-030-37734-2_42

Cited by 15 publications

(10 citation statements)

References 25 publications

(35 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As one of the most comprehensive ways to describe facial movements, Facial Action Coding System (FACS) has recently attracted widespread attention [20,21,25]. Pumarola et al [25] proposed an AU-based face editing system, which uses AU intensity labels to edit the input face to generate a face with specific facial muscles action.…”

Section: Au-based Face Editingmentioning

confidence: 99%

See 1 more Smart Citation

Talking Head Generation with Audio and Speech Related Facial Action Units

Chen¹,

Liu²,

Liu³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

The task of talking head generation is to synthesize a lip synchronized talking head video by inputting an arbitrary face image and audio clips. Most existing methods ignore the local driving information of the mouth muscles. In this paper, we propose a novel recurrent generative network that uses both audio and speech-related facial action units (AUs) as the driving information. AU information related to the mouth can guide the movement of the mouth more accurately. Since speech is highly correlated with speech-related AUs, we propose an Audio-to-AU module in our system to predict the speech-related AU information from speech. In addition, we use AU classifier to ensure that the generated images contain correct AU information. Frame discriminator is also constructed for adversarial training to improve the realism of the generated face. We verify the effectiveness of our model on the GRID dataset and TCD-TIMIT dataset. We also conduct an ablation study to verify the contribution of each component in our model. Quantitative and qualitative experiments demonstrate that our method outperforms existing methods in both image quality and lip-sync accuracy.Recently, talking head generation has attracted more and more attention in the fields of academic and industry, which is essential in the applications of human-computer interaction, film making, virtual reality, computer games, etc. This research explores how to generate a talking head video by inputting anyone's image as an identity image and driving information related to mouth movement, e.g., speech audio, and text.Before deep learning became popular, many researchers in early work relied on Hidden Markov Models (HMM) to capture the dynamic relationship between audio and lip motion

show abstract

Section: Au-based Face Editingmentioning

confidence: 99%

“…The value of AUs can either use binary classification to indicate whether these AUs are activated, or use intensity value to indicate activation intensity. FACS has attracted much attention in face editing [19,20,25], such as facial expression editing [25]. These works proved that AU information can be used to edit local facial regions.…”

mentioning

confidence: 99%

Talking Head Generation with Audio and Speech Related Facial Action Units

Chen¹,

Liu²,

Liu³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…GAN-based facial expression transfer: Recently GANs have received attention to transfer facial expressions from a source subject to a target subject. Existing work on GAN-based facial expression transfer approaches focus on generating facial images with discrete emotions [6][9], or the specified facial action units [30] [23]. Some of the GAN-based approaches specifically aim to guide their models with the facial geometry information.…”

Section: Related Workmentioning

confidence: 99%

Synthetic Expressions are Better Than Real for Learning to Detect Facial Actions

Niinuma

Ertuğrul

Cohn

et al. 2020

Preprint

View full text Add to dashboard Cite

Critical obstacles in training classifiers to detect facial actions are the limited sizes of annotated video databases and the relatively low frequencies of occurrence of many actions. To address these problems, we propose an approach that makes use of facial expression generation. Our approach reconstructs the 3D shape of the face from each video frame, aligns the 3D mesh to a canonical view, and then trains a GAN-based network to synthesize novel images with facial action units of interest. To evaluate this approach, a deep neural network was trained on two separate datasets: One network was trained on video of synthesized facial expressions generated from FERA17; the other network was trained on unaltered video from the same database. Both networks used the same train and validation partitions and were tested on the test partition of actual video from FERA17. The network trained on synthesized facial expressions outperformed the one trained on actual facial expressions and surpassed current state-of-the-art approaches.

show abstract

“…Many face synthesis methods have noticeably shown promising results [2][3][4][5]. [2,3] showed advances in synthesizing single facial image expressing seven discrete emotions [6].…”

Section: Introductionmentioning

confidence: 99%

“…This is because the facial expression is more complicated and diverse to be considered as the emotional aspects [7]. [4,5] solved these lack of diversities, by proposing models for producing synthetic facial representations using diverse combinations of AUs. These methods utilized facial Action Units (AUs) [8].…”

Section: Introductionmentioning

confidence: 99%

Comprehensive Facial Expression Synthesis using Human-Interpretable Language

Hong

Kim

Lee

et al. 2020

Preprint

View full text Add to dashboard Cite

Recent advances in facial expression synthesis have shown promising results using diverse expression representations including facial action units. Facial action units for an elaborate facial expression synthesis need to be intuitively represented for human comprehension, not a numeric categorization of facial action units. To address this issue, we utilize human-friendly approach: use of natural language where language helps human grasp conceptual contexts. In this paper, therefore, we propose a new facial expression synthesis model from language-based facial expression description. Our method can synthesize the facial image with detailed expressions. In addition, effectively embedding language features on facial features, our method can control individual word to handle each part of facial movement. Extensive qualitative and quantitative evaluations were conducted to verify the effectiveness of the natural language.

show abstract

Region Based Adversarial Synthesis of Facial Action Units

Cited by 15 publications

References 25 publications

Talking Head Generation with Audio and Speech Related Facial Action Units

Talking Head Generation with Audio and Speech Related Facial Action Units

Synthetic Expressions are Better Than Real for Learning to Detect Facial Actions

Comprehensive Facial Expression Synthesis using Human-Interpretable Language

Contact Info

Product

Resources

About