2018 13th IEEE International Conference on Automatic Face &Amp; Gesture Recognition (FG 2018) 2018
DOI: 10.1109/fg.2018.00062
|View full text |Cite
|
Sign up to set email alerts
|

Online Attention for Interpretable Conflict Estimation in Political Debates

Abstract: Conflict arises naturally in dyadic interactions when involved individuals act on incompatible goals, interests, or actions. In this paper, the problem of conflict intensity estimation from audiovisual recordings is addressed. To this end, we propose an online attention-based neural network in order to learn a mapping from a sequence of audiovisual features to time-series describing conflict intensity. The proposed method is evaluated by conducting experiments in conflict intensity estimation by employing the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2019
2019
2019
2019

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 20 publications
0
2
0
Order By: Relevance
“…Recently, a multi-modal conflict estimation method used a concatenation of audio and visual features as input to an LSTM-based encoder-decoder architecture with attention. This method focuses on visual features (facial gestures) and uses 65 audio Low-Level Descriptors (LLD) features, sampled at 25 Hz [23]. While hand-crafted features may facilitate interpretation of specific characteristics of the speech signal that are used as predictors for the task at hand, we aim to explore if an end-to-end learning framework can be used for a complex paralinguistic task such as verbal conflict intensity estimation by automatically learning relevant acoustic features for this task.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, a multi-modal conflict estimation method used a concatenation of audio and visual features as input to an LSTM-based encoder-decoder architecture with attention. This method focuses on visual features (facial gestures) and uses 65 audio Low-Level Descriptors (LLD) features, sampled at 25 Hz [23]. While hand-crafted features may facilitate interpretation of specific characteristics of the speech signal that are used as predictors for the task at hand, we aim to explore if an end-to-end learning framework can be used for a complex paralinguistic task such as verbal conflict intensity estimation by automatically learning relevant acoustic features for this task.…”
Section: Related Workmentioning
confidence: 99%
“…speech overlap ratio using BLSTM DNN [9] IS13 & over. forward-backward pass SVR [3] IS10 & IS13 overlap detection using SVR SVM + backward selection [23] FPF & LLD LSTM based encoder-decoder network [20] raw speech End-to-End Convolutional Neural Network Ours raw speech End-to-End CRNN with attention…”
Section: Related Workmentioning
confidence: 99%