2015
DOI: 10.1109/tmm.2015.2482819
|View full text |Cite
|
Sign up to set email alerts
|

Deep Head Pose: Gaze-Direction Estimation in Multimodal Video

Abstract: In this paper we present a Convolutional Neural Network based model for human head pose estimation in low-resolution multi-modal RGB-D data. We pose the problem as one of classification of human gazing direction. We further fine-tune a regressor based on the learned deep classifier. Next we combine the two models (classification and regression) to estimate approximate regression confidence. We present state-ofthe-art results in datasets that span the range of high resolution Human Robot Interaction (close up f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
65
1

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 173 publications
(66 citation statements)
references
References 39 publications
0
65
1
Order By: Relevance
“…1. According to [24], we can classify existing computer vision based head pose estimation methods into two categories: learning based methods [1][2][3] [16] [25][26][27][28][29][30][31][32][33][34][35][36][37][38] that need large amount of training data and computational resources and geometry based methods [4][5][6][7][8][9][10] [ [39][40][41][42][43][44][45][46][47][48][49] that are fast but with a little lower accuracy, see section II for details. In this paper, as shown in Fig.…”
mentioning
confidence: 99%
“…1. According to [24], we can classify existing computer vision based head pose estimation methods into two categories: learning based methods [1][2][3] [16] [25][26][27][28][29][30][31][32][33][34][35][36][37][38] that need large amount of training data and computational resources and geometry based methods [4][5][6][7][8][9][10] [ [39][40][41][42][43][44][45][46][47][48][49] that are fast but with a little lower accuracy, see section II for details. In this paper, as shown in Fig.…”
mentioning
confidence: 99%
“…For example, Fathi et al [35] built a probabilistic generative model to simultaneously predict the sequence of gaze locations and the respective action label from first person view videos. Mukherjee and Robertson [38] estimated the gaze direction based on the head pose in multimodal videos, and managed to recover human-human/scene interactions. In the image domain, Recasens et al [36] proposed a method to detect the object regions being fixated at by human in the scene.…”
Section: Related Workmentioning
confidence: 99%
“…The gaze direction have been detected from combined video and depth signals [24,25] and utilized in the visual attention model to estimate human-to-human interaction. Human gaze has also been used for semantic mapping of human attention in the 3D environment [26].…”
Section: Kinect Sensor and Its Usagementioning
confidence: 99%