Head Pose Classification in Crowded Scenes

Orozco, Javier; Gong, Shaogang; Xiang, Tao

doi:10.5244/c.23.120

Cited by 67 publications

(56 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Here these solutions are inapplicable since the faces are too small (50x40 pixels on average). In a low resolution domain the work proposed by Orozco et al [33] seems to fit better, relying on the computation of the mean image for each orientation class. Distances w.r.t.…”

Section: Head Pose Estimationmentioning

confidence: 99%

The S-HOCK dataset: Analyzing crowds at the stadium

Conigliaro

Rota

Setti

et al. 2015

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

The topic of crowd modeling in computer vision usually assumes a single generic typology of crowd, which is very simplistic. In this paper we adopt a taxonomy that is widely accepted in sociology, focusing on a particular category, the spectator crowd, which is formed by people "interested in watching something specific that they came to see" [6]. This can be found at the stadiums, amphitheaters, cinema, etc. In particular, we propose a novel dataset, the Spectators Hockey (S-HOCK), which deals with 4 hockey matches during an international tournament. In the dataset, a massive annotation has been carried out, focusing on the spectators at different levels of details: at a higher level, people have been labeled depending on the team they are supporting and the fact that they know the people close to them; going to the lower levels, standard pose information has been considered (regarding the head, the body) but also fine grained actions such as hands on hips, clapping hands etc. The labeling focused on the game field also, permitting to relate what is going on in the match with the crowd behavior. This brought to more than 100 millions of annotations, useful for standard applications as people counting and head pose estimation but also for novel tasks as spectator categorization. For all of these we provide protocols and baseline results, encouraging further research.

show abstract

Section: Head Pose Estimationmentioning

confidence: 99%

The S-HOCK dataset: Analyzing crowds at the stadium

Conigliaro

Rota

Setti

et al. 2015

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…corresponding facial landmarks such as eyes and lips to a set of trained poses. Recent studies have attempted to estimate head pose in low-resolution images [8] as well as crowded surveillance videos [52]. In addition to head pose, body posture configuration [46] and gait [49] may also play an important role in human intent inference.…”

Section: Intent Profilingmentioning

confidence: 99%

Security and Surveillance

Gong

Loy

Xiang

2011

Visual Analysis of Humans

Self Cite

View full text Add to dashboard Cite

Human eyes are highly efficient devices for scanning through a large quantity of low-level visual sensory data and delivering selective information to one's brain for high-level semantic interpretation and gaining situational awareness. Over the last few decades, the computer vision community has endeavoured to bring about similar perceptual capabilities to artificial visual sensors. Substantial efforts have been made towards understanding static images of individual objects and the corresponding processes in the human visual system. This endeavour is intensified further by the need for understanding a massive quantity of video data, with the aim to comprehend multiple entities not only within a single image but also over time across multiple video frames for understanding their spatio-temporal relations. A significant application of video analysis and understanding is intelligent surveillance, which aims to interpret automatically human activity and detect unusual events that could pose a threat to public security and safety.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Head-pose classification from surveillance images has been investigated in a number of works [3,5,16,19]. In [16], a Kullback-Leibler distance-based facial appearance descriptor is proposed for low resolution images. The array-ofcovariances (ARCO) descriptor is introduced in [19], and is found to be effective for representing faces as it is robust to scale and illumination changes.…”

Section: Related Workmentioning

confidence: 99%

“…However, most existing approaches compute the head pose from high resolution images, where facial features are clearly visible. Estimating the head pose from large field-of-view surveillance cameras, where faces are typically captured at 50×50 or lower pixel resolution, has received importance only recently [5,16,19]. Computing the head pose under these conditions is difficult, as faces appear blurred and models employing detailed facial information are ineffective.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

No Matter Where You Are: Flexible Graph-Guided Multi-task Learning for Multi-view Head Pose Classification under Target Motion

Yan

Ricci

Subramanian

et al. 2013

2013 IEEE International Conference on Computer Vision

View full text Add to dashboard Cite

We propose a novel Multi-Task Learning framework (FEGA-MTL) for classifying the head pose of a person who moves freely in an environment monitored by multiple, large field-of-view surveillance cameras. As the target (person) moves, distortions in facial appearance owing to camera perspective and scale severely impede performance of traditional head pose classification methods. FEGA-MTL operates on a dense uniform spatial grid and learns appearance relationships across partitions as well as partition-specific appearance variations for a given head pose to build region-specific classifiers. Guided by two graphs which a-priori model appearance similarity among (i) grid partitions based on camera geometry and (ii) head pose classes, the learner efficiently clusters appearancewise related grid partitions to derive the optimal partitioning. For pose classification, upon determining the target's position using a person tracker, the appropriate regionspecific classifier is invoked. Experiments confirm that FEGA-MTL achieves state-of-the-art classification with few training data.

show abstract

Head Pose Classification in Crowded Scenes

Cited by 67 publications

References 16 publications

The S-HOCK dataset: Analyzing crowds at the stadium

The S-HOCK dataset: Analyzing crowds at the stadium

Security and Surveillance

No Matter Where You Are: Flexible Graph-Guided Multi-task Learning for Multi-view Head Pose Classification under Target Motion

Contact Info

Product

Resources

About