CR-Net: A Deep Classification-Regression Network for Multimodal Apparent Personality Analysis

Li, Yunan; Wan, Jun; Miao, Qiguang; Escalera, Sérgio; Fang, Huijuan; Chen, Huizhou; Qi, Xiangda; Guo, Guodong

doi:10.1007/s11263-020-01309-y

Cited by 37 publications

(30 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then, a consensus strategy is employed to process all the selected frames, to produce video-level personality predictions. Li et al [29] also select a face image and a face-background image from each segment and stacks them as the clip-level stream. This approach converts the acoustic wave of an entire clip to fixedlength vectors as the clip-level audio representation.…”

Section: Related Work 21 Automatic Audio-visual Personality Analysismentioning

confidence: 99%

“…Each ERN used in our experiments is made up of two Methods Ope Con Ext Agr Neu Avg. ACC Spectral [45] 0.752 0.807 0.849 0.800 0.788 0.799 DCC [18] 0.755 0.787 0.772 0.736 0.791 0.768 NJU-LAMDA [51] 0.741 0.826 0.827 0.753 0.789 0.787 CR-Net [29] 0.830 0.876 0.904 0.887 0.903 0.880 PALs [44] 0.845 0.819 0.916 0.837 0.911 0.866 Ours (A-MModal (S)) 0.833 0.890 0.913 0.869 0.917 0.884 Ours (MModal (M)) 0.889 0.925 0.923 0.913 0.921 0.914 Ours (A-MModal (M)) 0.882 0.925 0.931 0.912 0.925 0.915 PCC Spectral [45] -0.010 0.059 0.135 0.071 0.024 0.056 DCC [18] -0.153 -0.078 0.037 -0.024 0.121 0.008 NJU-LAMDA [51] MModal denotes the graph representations of multi-modal processors. (M) and (S) represent the multi-level and singlelevel fusion, respectively.…”

Section: Implementation Detailsmentioning

confidence: 99%

“…As there is converging evidence demonstrating that nonverbal behaviours are significant predictors of personality, most existing automatic approaches attempt to predict personality traits from nonverbal audio-visual behaviours (e.g., facial expressions, vocal prosody, etc.) [10,17,29,49,53]. Majority of these works focus on analysing an individual's observable behaviours, disregarding the interpersonal interaction context and cues (e.g., interpersonal behaviours due to dyadic / triadic /group interactions).…”

Section: Introductionmentioning

confidence: 99%

“…Therefore, these training strategies would end up utilising the same input pattern with multiple labels, making it practically impossible to train a model that has a good generalization capability [44,45,47] (Problem 2). Although some approaches select a set of key frames to represent an entire video and infer personality from such video-level representations [4,29,53] , they ignore the details contained in the discarded frames (Problem 3).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Personality Recognition by Modelling Person-specific Cognitive Processes using Graph Representation

Shao

Song

Jaiswal

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Recent research shows that in dyadic and group interactions individuals' nonverbal behaviours are influenced by the behaviours of their conversational partner(s). Therefore, in this work we hypothesise that during a dyadic interaction, the target subject's facial reactions are driven by two main factors: (i) their internal (personspecific) cognition, and (ii) the externalised nonverbal behaviours of their conversational partner. Subsequently, our novel proposition is to simulate and represent the target subject's (i.e., the listener) cognitive process in the form of a person-specific CNN architecture whose input is the audio-visual non-verbal cues displayed by the conversational partner (i.e., the speaker), and the output is the target subject's (i.e., the listener) facial reactions. We then undertake a search for the optimal CNN architecture whose results are used to create a person-specific graph representation for recognising the target subject's personality. The graph representation, fortified with a novel end-to-end edge feature learning strategy, helps with retaining both the unique parameters of the person-specific CNN and the geometrical relationship between its layers. Consequently, the proposed approach is the first work that aims to recognize the true (self-reported) personality of a target subject (i.e., the listener) from the learned simulation of their cognitive process (i.e., parameters of the person-specific CNN). The experimental results show that the CNN architectures are well associated with target subjects' personality traits and the proposed approach clearly outperforms multiple existing approaches that predict personality directly from non-verbal behaviours. In light of these findings, this work opens up a new avenue of research for predicting and recognizing socioemotional phenomena (personality, affect, engagement etc.) from simulations of person-specific cognitive processes.

show abstract

Section: Related Work 21 Automatic Audio-visual Personality Analysismentioning

confidence: 99%

Section: Implementation Detailsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Personality Recognition by Modelling Person-specific Cognitive Processes using Graph Representation

Shao

Song

Jaiswal

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

show abstract

“…Recent advances in machine learning (ML) have enabled the development of non-invasive automatic personality traits analysers that recognise subjects' personality traits from their audiovisual non-verbal behaviours [16,28,52,80,90,98] as there is solid psychological and biological evidence [19,27,48,95] claiming that nonverbal behaviours are reliable predictors of personality. In most of these approaches, ML models are trained with the personality labels provided by the external observers (annotators), and they therefore output their perception of the target subjects' personality.…”

Section: Introductionmentioning

confidence: 99%

Learning Graph Representation of Person-specific Cognitive Processes from Audio-visual Behaviours for Automatic Personality Recognition

Song¹,

Shao²,

Jaiswal³

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper proposes to recognise the true (self-reported) personality from the learned simulation of the target subject's cognition. This approach builds on two following findings in cognitive science: (i) human cognition partially determines expressed behaviour and is directly linked to true personality traits; and (ii) in dyadic interactions individuals' nonverbal behaviours are influenced by their conversational partner's behaviours. In this context, we hypothesise that during a dyadic interaction, a target subject's facial reactions are driven by two main factors, i.e. their internal (person-specific) cognitive process, and the externalised nonverbal behaviours of their conversational partner. Consequently, we propose to represent the target subject's (defined as the listener) person-specific cognition in the form of a person-specific CNN architecture that has unique architectural parameters and depth, which takes audio-visual non-verbal cues displayed by the conversational partner (defined as the speaker) as input, and is able to reproduce the target subject's facial reactions. Each personspecific CNN is explored by the Neural Architecture Search (NAS) and a novel adaptive loss function, which is then represented as a graph representation for recognising the target subject's true personality. Experimental results not only show that the produced graph representations are well associated with target subjects' personality traits in both human-human and human-machine interaction scenarios, and outperform the existing approaches with significant advantages, but also demonstrate that the proposed novel strategies such as adaptive loss, and the end-to-end vertices/edges feature learning, help the proposed approach in learning more reliable personality representations. Building on our earlier version of this work, this paper further proposes: (i) assigning a unique depth for each CNN; (ii) a novel end-to-end graph vertex feature learning strategy; (iii) a transformer-based edge feature learning strategy; and (iv) evaluating the approach in human-machine interaction scenario.

show abstract

Adaptive information fusion network for multi‐modal personality recognition

Bao,

Liu,

et al. 2024

Computer Animation & Virtual

View full text Add to dashboard Cite

Personality recognition is of great significance in deepening the understanding of social relations. While personality recognition methods have made significant strides in recent years, the challenge of heterogeneity between modalities during feature fusion still needs to be solved. This paper introduces an adaptive multi‐modal information fusion network (AMIF‐Net) capable of concurrently processing video, audio, and text data. First, utilizing the AMIF‐Net encoder, we process the extracted audio and video features separately, effectively capturing long‐term data relationships. Then, adding adaptive elements in the fusion network can alleviate the problem of heterogeneity between modes. Lastly, we concatenate audio‐video and text features into a regression network to obtain Big Five personality trait scores. Furthermore, we introduce a novel loss function to address the problem of training inaccuracies, taking advantage of its unique property of exhibiting a peak at the critical mean. Our tests on the ChaLearn First Impressions V2 multi‐modal dataset show partial performance surpassing state‐of‐the‐art networks.

show abstract

CR-Net: A Deep Classification-Regression Network for Multimodal Apparent Personality Analysis

Cited by 37 publications

References 40 publications

Personality Recognition by Modelling Person-specific Cognitive Processes using Graph Representation

Personality Recognition by Modelling Person-specific Cognitive Processes using Graph Representation

Learning Graph Representation of Person-specific Cognitive Processes from Audio-visual Behaviours for Automatic Personality Recognition

Adaptive information fusion network for multi‐modal personality recognition

Contact Info

Product

Resources

About