Online Attention for Interpretable Conflict Estimation in Political Debates

Vereecken, Ruben; Petridis, Stavros; Panagakis, Yannis; Pantić, Maja

doi:10.1109/fg.2018.00062

Cited by 1 publication

(2 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, a multi-modal conflict estimation method used a concatenation of audio and visual features as input to an LSTM-based encoder-decoder architecture with attention. This method focuses on visual features (facial gestures) and uses 65 audio Low-Level Descriptors (LLD) features, sampled at 25 Hz [23]. While hand-crafted features may facilitate interpretation of specific characteristics of the speech signal that are used as predictors for the task at hand, we aim to explore if an end-to-end learning framework can be used for a complex paralinguistic task such as verbal conflict intensity estimation by automatically learning relevant acoustic features for this task.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

ConflictNET: End-to-End Learning for Speech-Based Conflict Intensity Estimation

Rajan

Brutti

Cavallaro

2019

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

Computational paralinguistics aims to infer human emotions, personality traits and behavioural patterns from speech signals. In particular, verbal conflict is an important example of human-interaction behaviour, whose detection would enable monitoring and feedback in a variety of applications. The majority of methods for detection and intensity estimation of verbal conflict apply off-the-shelf classifiers/regressors to generic handcrafted acoustic features. Generating conflict-specific features requires refinement steps and the availability of metadata, such as the number of speakers and their speech overlap duration. Moreover, most techniques treat feature extraction and regression as independent modules, which require separate training and parameter tuning. To address these limitations, we propose the first end-to-end convolutional-recurrent neural network architecture that learns conflict-specific features directly from raw speech waveforms, without using explicit domain knowledge or metadata. Additionally, to selectively focus the model on portions of speech containing verbal conflict instances, we include a global attention interface that learns the alignment between layers of the recurrent network. Experimental results on the SSPNet Conflict Corpus show that our end-to-end architecture achieves state-ofthe-art performance in terms of Pearson Correlation Coefficient.

show abstract

Section: Related Workmentioning

confidence: 99%

“…speech overlap ratio using BLSTM DNN [9] IS13 & over. forward-backward pass SVR [3] IS10 & IS13 overlap detection using SVR SVM + backward selection [23] FPF & LLD LSTM based encoder-decoder network [20] raw speech End-to-End Convolutional Neural Network Ours raw speech End-to-End CRNN with attention…”

Section: Related Workmentioning

confidence: 99%