2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2018
DOI: 10.1109/cvprw.2018.00281
|View full text |Cite
|
Sign up to set email alerts
|

Fine-Grained Head Pose Estimation Without Keypoints

Abstract: Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. Traditionally head pose is computed by estimating some keypoints from the target face and solving the 2D to 3D correspondence problem with a mean human head model. We argue that this is a fragile method because it relies entirely on landmark detection performance, the extraneous head model and an ad-hoc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
447
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 496 publications
(447 citation statements)
references
References 36 publications
0
447
0
Order By: Relevance
“…Other approaches evaluated on the AFLW and AFW datasets are summarized in Table 3. The number of parameters of [27] is based on their provided open source implementation, which is executable on a GPU based system. 4 In order to compare the frame rate, we reimplemented the LeNet-5 variant of [25].…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Other approaches evaluated on the AFLW and AFW datasets are summarized in Table 3. The number of parameters of [27] is based on their provided open source implementation, which is executable on a GPU based system. 4 In order to compare the frame rate, we reimplemented the LeNet-5 variant of [25].…”
Section: Methodsmentioning
confidence: 99%
“…4 In order to compare the frame rate, we reimplemented the LeNet-5 variant of [25]. In comparison, our ResNet18-64 has the lowest number of parameters while predicting more accurately than the LeNet-5 variant [25] and nearly as accurate as the ResNet50 [27]. Patacchiola and Cangelosi [25] also use low-resolution images with 64 x 64 pixels, while Ruiz et al [27] take larger images with 224 x 224 pixels.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In this section, we compare several approaches using the Gaze360 dataset. We compared the following methods: Mean -uses the mean gaze of the training set for all predictions; Deep Head Pose -a deep network based head pose estimator by Ruiz et al [19]; Static -the backbone model, ResNet-18, and two final layers to compute the prediction; TRN -a version of Temporal Relation Network [33] where the features of frames at fixed windows around time t are concatenated before averaging the predictions of the temporal windows; LSTM -refers to the Gaze360 architecture.…”
Section: Model Evaluationmentioning
confidence: 99%
“…Following [16] we use a truncated pre-trained ResNet-50 network to extract head pose from the raw face image I. We note φ the embeddings (2048 units) of the last fullyconnected layer.…”
Section: Head Pose Estimationmentioning
confidence: 99%