POSEidon: Face-from-Depth for Driver Pose Estimation

IEEE Trans. Pattern Anal. Mach. Intell.

Fabbri

Vezzani

et al. 2020

Self Cite

Depth cameras allow to set up reliable solutions for people monitoring and behavior understanding, especially when unstable or poor illumination conditions make unusable common RGB sensors. Therefore, we propose a complete framework for the estimation of the head and shoulder pose based on depth images only. A head detection and localization module is also included, in order to develop a complete end-to-end system. The core element of the framework is a Convolutional Neural Network, called POSEidon + , that receives as input three types of images and provides the 3D angles of the pose as output. Moreover, a Face-from-Depth component based on a Deterministic Conditional GAN model is able to hallucinate a face from the corresponding depth image. We empirically demonstrate that this positively impacts the system performances. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Experimental results show that our method overcomes several recent state-of-art works based on both intensity and depth input data, running in real-time at more than 30 frames per second.

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Face-from-Depth for Head Pose Estimation on Depth Images

IEEE Trans. Pattern Anal. Mach. Intell.

Fabbri

Vezzani

et al. 2020

Self Cite

“…Experiments are conducted exploiting two publicly available datasets: Pandora [6] and MotorMark [13]. Pandora contains more than 250k frames, splitted into 110 annotated sequences of 22 different actors (10 males and 12 females), while MotorMark is composed of more than 30k frames of 35 different subjects, guaranteeing a great variety of face appearances.…”

Section: A Datasetsmentioning

confidence: 99%

Domain Translation with Conditional GANs: from Depth to RGB Face-to-Face

Fabbri

2018 24th International Conference on Pattern Recognition (ICPR)

Lanzi

et al. 2018

Self Cite

Can faces acquired by low-cost depth sensors be useful to catch some characteristic details of the face? Typically the answer is no. However, new deep architectures can generate RGB images from data acquired in a different modality, such as depth data. In this paper, we propose a new Deterministic Conditional GAN, trained on annotated RGB-D face datasets, effective for a face-to-face translation from depth to RGB. Although the network cannot reconstruct the exact somatic features for unknown individual faces, it is capable to reconstruct plausible faces; their appearance is accurate enough to be used in many pattern recognition tasks. In fact, we test the network capability to hallucinate with some Perceptual Probes, as for instance face aspect classification or landmark detection. Depth face can be used in spite of the correspondent RGB images, that often are not available due to difficult luminance conditions. Experimental results are very promising and are as far as better than previously proposed approaches: this domain translation can constitute a new way to exploit depth data in new future applications.

“…The Pandora dataset was introduced in [3] for the head pose estimation task in depth images. It consists of more than Table 1.…”

Section: Pandora Datasetmentioning

confidence: 99%

Learning to Generate Facial Depth Maps

Pini

Grazioli

2018 International Conference on 3D Vision (3DV)

et al. 2018

Self Cite

In this paper, an adversarial architecture for facial depth map estimation from monocular intensity images is presented. By following an image-to-image approach, we combine the advantages of supervised learning and adversarial training, proposing a conditional Generative Adversarial Network that effectively learns to translate intensity face images into the corresponding depth maps. Two public datasets, namely Biwi database and Pandora dataset, are exploited to demonstrate that the proposed model generates high-quality synthetic depth images, both in terms of visual appearance and informative content. Furthermore, we show that the model is capable of predicting distinctive facial details by testing the generated depth maps through a deep model trained on authentic depth maps for the face verification task.