Proceedings of the 14th ACM International Conference on Multimodal Interaction 2012
DOI: 10.1145/2388676.2388781
|View full text |Cite
|
Sign up to set email alerts
|

Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering

Abstract: We present experiments on fusing facial video, audio and lexical indicators for affect estimation during dyadic conversations. We use temporal statistics of texture descriptors extracted from facial video, a combination of various acoustic features, and lexical features to create regression based affect estimators for each modality. The single modality regressors are then combined using particle filtering, by treating these independent regression outputs as measurements of the affect states in a Bayesian filte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
35
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 52 publications
(36 citation statements)
references
References 19 publications
1
35
0
Order By: Relevance
“…We showed that LZM and QLZM representations are useful, but the most appropriate representation for affect recognition in posed and naturalistic settings may differ (unlike the common practice: [33] vs. [30,32]). …”
Section: Resultsmentioning
confidence: 94%
“…We showed that LZM and QLZM representations are useful, but the most appropriate representation for affect recognition in posed and naturalistic settings may differ (unlike the common practice: [33] vs. [30,32]). …”
Section: Resultsmentioning
confidence: 94%
“…Table 1 shows the cross-correlation coefficients between predicted and ground-truth labels, averaged over all sequences. We include the baseline result from [37] and the results of the top four contenders from AVEC 2012 [27,41,36,30]. We also include our results on audio-visual input (bottom row), discussed in Section 4.4.…”
Section: Methodsmentioning
confidence: 99%
“…al. [20] use temporal statistics of texture descriptors extracted from facial videos, a combination of various acoustic features, and lexical features to create regression based affect estimators for each modality. The single modality regressors are then combined using particle filtering, by treating these independent regression outputs as measurements of the affective states in a Bayesian filtering framework, where previous observations provide prediction about the current state by means of learned affect dynamics.…”
Section: Related Workmentioning
confidence: 99%