2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) 2020
DOI: 10.1109/fg47880.2020.00053
|View full text |Cite
|
Sign up to set email alerts
|

Latent-Based Adversarial Neural Networks for Facial Affect Estimations

Abstract: There is a growing interest in affective computing research nowadays given its crucial role in bridging humans with computers. This progress has recently been accelerated due to the emergence of bigger dataset. One recent advance in this field is the use of adversarial learning to improve model learning through augmented samples. However, the use of latent features, which is feasible through adversarial learning, is not largely explored, yet. This technique may also improve the performance of affective models,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
21
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

4
4

Authors

Journals

citations
Cited by 9 publications
(22 citation statements)
references
References 27 publications
1
21
0
Order By: Relevance
“…This phenomenon (i.e., the advantage of using multiple inputs (or metrics) over a single input) has been observed in the field of machine learning, which is a sub-field of artificial intelligence. Specifically, it was demonstrated that using multiple modality inputs (i.e., multi-modal, or in our case, multiple system inputs (or metrics)) produces more accurate predictions than using a single modality input (i.e., uni-modal, or in our case, single-system metric input) both in visual [42,43] and bio-signal analyses [44]. As a result of this discovery, we chose to use multiple system inputs (or metrics) in order to improve prediction accuracy, as we demonstrate later in the experiment results (Section 4.4.2).…”
Section: Discussionmentioning
confidence: 99%
“…This phenomenon (i.e., the advantage of using multiple inputs (or metrics) over a single input) has been observed in the field of machine learning, which is a sub-field of artificial intelligence. Specifically, it was demonstrated that using multiple modality inputs (i.e., multi-modal, or in our case, multiple system inputs (or metrics)) produces more accurate predictions than using a single modality input (i.e., uni-modal, or in our case, single-system metric input) both in visual [42,43] and bio-signal analyses [44]. As a result of this discovery, we chose to use multiple system inputs (or metrics) in order to improve prediction accuracy, as we demonstrate later in the experiment results (Section 4.4.2).…”
Section: Discussionmentioning
confidence: 99%
“…Given input image I which contains the facial area, both G and D will be responsible to learn low dimensional features that the combiner will use to estimate the associated Valence (V) and Arousal (A) state θ. The architecture of both the G and D networks follows the recent work from (Aspandi et al, 2020), and we propose to use LSTM enhanced with attention to create our C network. We proposed two main architecture variants: the ANCLaF network (left part of Figure 1), which uses single images as input and estimates V and A values independently for each frame, and ANCLaF-S and ANCLaF-SA (right part of Figure 1) that uses sequences of latent features extracted from n frames as input, and utilises LSTM RNNs for the inference (-S), optionally combined with internal attention layers (-SA).…”
Section: Methodsmentioning
confidence: 99%
“…While the specified length of temporal modelling has been shown to affect the final results on other related facial analysis tasks (Kossaifi et al, 2017;Xia et al, 2020;Farhadi and Fox, 2018;Aspandi et al, 2019b), the computational cost required to train large spatio-temporal models hampers one to address such analysis. However, these problems could be mitigated by: 1) the use of progressive sequence learning to permit stepwise observations of various sequence lengths; this approach has been shown in the recent work of (Aspandi et al, 2019b) on facial landmark estimations, which uses curriculum learning enabling more robust training analysis and tuning of the temporal length; 2) the use of reduced feature sizes, enabling more effi- cient training process (Comas et al, 2020); this has been explored in the affective computing field by the recent works such as (Aspandi et al, 2020), which uses generative modelling to extract a latent space of representative features. These two aspects have inspired us to propose the combined models presented in this work, as explained in the next section.…”
Section: Related Workmentioning
confidence: 99%
“…Firstly, we present an annotated dataset, the Game Lie Dataset (GLD), based on frontal facial recordings of 19 participants who try their best to fool their opponents in the liar card game. Secondly, we depart from the dominating trend of lie detection based on micro-expressions and investigate whether a lie can be detected by analyzing solely the facial patterns contained on single images as input to cutting-edge machine learning [13][14][15] and deep learning [16][17][18][19] facial analysis algorithms.…”
Section: Introductionmentioning
confidence: 99%