Human Detection in Crowded Situations by Combining Stereo Depth and Deeply-Learned Models

Beleznai, Csaba; Steininger, Daniel; Broneder, Elisabeth

doi:10.1007/978-3-030-04946-1_47

Cited by 1 publication

(2 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Approaches with increasing sophistication [25], [17] later employed the popular occupancy map concept or the voxel space [24] to delineate individual human candidates. The idea of combining the representational strength of learning on combined RGB-D inputs has been proposed by several papers [41], [26], [4]. Nevertheless, accomplished improvements are rather small.…”

Section: Related State Of the Artmentioning

confidence: 99%

“…To detect human candidates in the depth data, we employ an occupancy map clustering scheme. In the occupancy map, clusters corresponding to humans and compact objects are delineated using a hierarchically-structured tree of learned shape templates [4]. Thus, local grouping within the two-dimensional occupancy map generates consistent object hypotheses and suppresses background clutter and noise.…”

Section: D Multi-object Detection and Trackingmentioning

confidence: 99%

See 1 more Smart Citation

RGB-D Railway Platform Monitoring and Scene Understanding for Enhanced Passenger Safety

Wallner

Steininger

Widhalm

et al. 2021

Pattern Recognition. ICPR International Workshops and Challenges

Self Cite

View full text Add to dashboard Cite

Automated monitoring and analysis of passenger movement in safety-critical parts of transport infrastructures represent a relevant visual surveillance task. Recent breakthroughs in visual representation learning and spatial sensing opened up new possibilities for detecting and tracking humans and objects within a 3D spatial context. This paper proposes a flexible analysis scheme and a thorough evaluation of various processing pipelines to detect and track humans on a ground plane, calibrated automatically via stereo depth and pedestrian detection. We consider multiple combinations within a set of RGB-and depth-based detection and tracking modalities. We exploit the modular concepts of Meshroom [2] and demonstrate its use as a generic vision processing pipeline and scalable evaluation framework. Furthermore, we introduce a novel open RGB-D railway platform dataset with annotations to support research activities in automated RGB-D surveillance. We present quantitative results for multiple object detection and tracking for various algorithmic combinations on our dataset. Results indicate that the combined use of depth-based spatial information and learned representations yields substantially enhanced detection and tracking accuracies. As demonstrated, these enhancements are especially pronounced in adverse situations when occlusions and objects not captured by learned representations are present.

show abstract