2023
DOI: 10.1109/taffc.2021.3064601
|View full text |Cite
|
Sign up to set email alerts
|

Self-Supervised Learning of Person-Specific Facial Dynamics for Automatic Personality Recognition

Abstract: This paper aims to solve two important issues that frequently occur in existing automatic personality analysis systems: 1. Attempting to use very short video segments or even single frames, rather than long-term behaviour, to infer personality traits; 2. Lack of methods to encode person-specific facial dynamics for personality recognition. To deal with these issues, this paper firstly proposes a novel Rank Loss which utilizes the natural temporal evolution of facial actions, rather than personality labels, for… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 33 publications
(27 citation statements)
references
References 90 publications
(130 reference statements)
0
27
0
Order By: Relevance
“…Novelty: The main novelties of the proposed approach are summarised as follows: firstly. we propose to use the simulated human cognition as the source descriptor to recognise true personality traits, which differs from existing approaches [7,16,33,52,70,80,94,98] that predict apparent personality traits directly from target subjects' expressive behaviours. Secondly, we propose the first non-invasive approach that simulates human person-specific cognitive processes that relate to facial reactions.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Novelty: The main novelties of the proposed approach are summarised as follows: firstly. we propose to use the simulated human cognition as the source descriptor to recognise true personality traits, which differs from existing approaches [7,16,33,52,70,80,94,98] that predict apparent personality traits directly from target subjects' expressive behaviours. Secondly, we propose the first non-invasive approach that simulates human person-specific cognitive processes that relate to facial reactions.…”
Section: Methodsmentioning
confidence: 99%
“…In summary, while modelling personality traits at the frame/segment-level is problematic, the recent clip-level representations usually failed to utilise the full scale of the available information in the data, as they select a subset or key frames to represent an entire video. To avoid these problems, Song et al [80] propose a domain adaption approach to learn a set of intermediate convolution layers from all available data as the person-specific representation for the target subject, which achieved a comparable performance to the state-of-the-art method [52]. However, similar to the approaches described above, it still directly infers apparent personality based on the subjects' observable behaviours.…”
Section: Audio-visual Automatic Personality Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…Each ERN used in our experiments is made up of two Methods Ope Con Ext Agr Neu Avg. ACC Spectral [45] 0.752 0.807 0.849 0.800 0.788 0.799 DCC [18] 0.755 0.787 0.772 0.736 0.791 0.768 NJU-LAMDA [51] 0.741 0.826 0.827 0.753 0.789 0.787 CR-Net [29] 0.830 0.876 0.904 0.887 0.903 0.880 PALs [44] 0.845 0.819 0.916 0.837 0.911 0.866 Ours (A-MModal (S)) 0.833 0.890 0.913 0.869 0.917 0.884 Ours (MModal (M)) 0.889 0.925 0.923 0.913 0.921 0.914 Ours (A-MModal (M)) 0.882 0.925 0.931 0.912 0.925 0.915 PCC Spectral [45] -0.010 0.059 0.135 0.071 0.024 0.056 DCC [18] -0.153 -0.078 0.037 -0.024 0.121 0.008 NJU-LAMDA [51] MModal denotes the graph representations of multi-modal processors. (M) and (S) represent the multi-level and singlelevel fusion, respectively.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…The problem with these approaches is that at the level of a single frame or short segment, even people with different personality traits may display very similar non-verbal audio-visual behaviours. Therefore, these training strategies would end up utilising the same input pattern with multiple labels, making it practically impossible to train a model that has a good generalization capability [44,45,47] (Problem 2). Although some approaches select a set of key frames to represent an entire video and infer personality from such video-level representations [4,29,53] , they ignore the details contained in the discarded frames (Problem 3).…”
Section: Introductionmentioning
confidence: 99%