Neural conditional ordinal random fields for agreement level estimation

Rakicevic, Nemanja; Rudovic, Ognjen; Petridis, Stavros; Pantić, Maja

doi:10.1109/acii.2015.7344679

Cited by 1 publication

(5 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other lines of work deal with the analysis of (dis)agreement expression channels and verbal/non-verbal cues [6], [1], estimation based on lexical and text-based data [7], as well as audio and prosody cues [8]. Due to the variety of means by which the (dis)agreement can be communicated, we adopt the multi-level annotation scale introduced in [9]. The agreement levels are represented using Likert scale [10], where the intensity of agreement ranges from strong disagreement to strong agreement.…”

Section: A Agreement Detectionmentioning

confidence: 99%

“…The agreement levels are represented using Likert scale [10], where the intensity of agreement ranges from strong disagreement to strong agreement. In particular, the (dis)agreement levels are defined as: neutral {0}, (dis)agreement {-1,+1}, strong (dis)agreement {-2,+2} as defined in [9].…”

Section: A Agreement Detectionmentioning

confidence: 99%

“…However, these methods fail to account for ordinal information in the target data. We extend the standard CORF model [21], and uni-modal Neural CORF [9] to design the multi-modal NCORF approach for the target task. In all these models, we first define the conditional distribution P (Y|X) of having a label sequence Y, based on the observation sequence X, as:…”

Section: Structure Modellingmentioning

confidence: 99%

“…To train and evaluate the performance of the model we used the MAHNOB-Mimicry database [26] for which we performed the agreement intensity level annotation, as in [9]. The database consists of video recordings of 54 dyadic discussion sessions.…”

Section: Datasetmentioning

confidence: 99%

“…The database consists of video recordings of 54 dyadic discussion sessions. We selected videos of 38 subjects, an extension of [9], where authors used only 5 subjects. Labelling was done using segments, which could be defined as a generalisation of 'spurts' -periods of speech by one speaker that have no pauses greater than 0.5 second (similarly as defined in [27]) -where both audio and visual modality were considered.…”

Section: Datasetmentioning

confidence: 99%

See 4 more Smart Citations

Multi-modal Neural Conditional Ordinal Random Fields for agreement level estimation

Rakicevic

Rudovic

Petridis

et al. 2016

2016 23rd International Conference on Pattern Recognition (ICPR)

Self Cite

View full text Add to dashboard Cite

Abstract-The ability to automatically detect the extent of agreement or disagreement a person expresses is an important indicator of inter-personal relations and emotion expression. Most of existing methods for automated analysis of human agreement from audio-visual data perform agreement detection using either audio or visual modality of human interactions. However, this is suboptimal as expression of different agreement levels is composed of various facial and vocal cues specific to the target level. To this end, we propose the first approach for multi-modal estimation of agreement intensity levels. Specifically, our model leverages the feature representation power of Multimodal Neural Networks (NN) and discriminative power of Conditional Ordinal Random Fields (CORF) to achieve dynamic classification of agreement levels from videos. We show on the MAHNOB-Mimicry database of dyadic human interactions that the proposed approach outperforms its uni-modal and linear counterparts, and related models that can be applied to the target task.

show abstract