Proceedings of the 20th ACM International Conference on Multimodal Interaction 2018
DOI: 10.1145/3242969.3264984
|View full text |Cite
|
Sign up to set email alerts
|

Predicting Engagement Intensity in the Wild Using Temporal Convolutional Network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0
2

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 35 publications
(19 citation statements)
references
References 12 publications
0
17
0
2
Order By: Relevance
“…In the proposed clip-level method (see Section 5.2), adding the affect features to the behavioral features reduces MSE (second row of Table 7), showing the effectiveness of the affect states in engagement level regression. After adding affect features, the MSE of the proposed method is very close to [38] and [39]. [17] 0.1000 DFSTN [7] 0.0736 body-pose features + LSTM [45] 0.0717 eye, head-pose, and AUs features + TCN [39] 0.0655 eye, head-pose, and AUs features + GRU [38] 0.0671 As can be observed in Figure 6 (d), the behavioral features of the two videos in classes 2 and 3 are different from the video in class 1.…”
Section: Tablementioning
confidence: 86%
See 3 more Smart Citations
“…In the proposed clip-level method (see Section 5.2), adding the affect features to the behavioral features reduces MSE (second row of Table 7), showing the effectiveness of the affect states in engagement level regression. After adding affect features, the MSE of the proposed method is very close to [38] and [39]. [17] 0.1000 DFSTN [7] 0.0736 body-pose features + LSTM [45] 0.0717 eye, head-pose, and AUs features + TCN [39] 0.0655 eye, head-pose, and AUs features + GRU [38] 0.0671 As can be observed in Figure 6 (d), the behavioral features of the two videos in classes 2 and 3 are different from the video in class 1.…”
Section: Tablementioning
confidence: 86%
“…Different from the end-to-end approaches, in feature-based approaches, first, multi-modal handcrafted features are extracted from videos, and then the features are fed to a classifier or regressor to output engagement [6], [7], [8], [10], [11], [12], [16], [17], [38], [39], [40], [41], [42], [43], [44], [45]. Table 1 summarizes the literature of feature-based video engagement measurement approaches focusing on their features, machine-learning models, and datasets.…”
Section: Feature-based Video Engagement Measurementmentioning
confidence: 99%
See 2 more Smart Citations
“…Multimodal learning analytics has been the subject of increasing attention in recent years and has shown significant promise for modeling learning and engagement across a range of educational contexts [2,27,30,31,35,36]. For example, Sümer et al examined learner engagement using pose estimation and facial expression data in school classrooms [35].…”
Section: Multimodal Learning Analyticsmentioning
confidence: 99%