3D Random Occlusion and Multi-layer Projection for Deep Multi-camera Pedestrian Localization

Qiu, Rui; Xu, Ming; Yan, Yuyao; Smith, Jeremy S.; Yang, Xi

doi:10.1007/978-3-031-20080-9_40

Cited by 18 publications

(2 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) posed a multi-head self-attention based multi-view fusion method. Qiu et al (2022) proposed a data augmentation method by generating random 3D cylinder occlusions on the ground plane to relieve model overfitting.…”

Section: Related Workmentioning

confidence: 99%

“…This is not suitable for better validating and comparing different multiview people detection methods, not to mention for generalizing to novel new scenes with different camera layouts, or other more practical real-world application scenarios. Qiu et al (2022) noticed the issue and tried to solve the problem from the aspect of data augmentation, but still evaluated the methods only on small scenes. Besides, in contrast to SHOT (Song et al 2021) or MVDeTr (Hou and Zheng 2021) which uses self-attention weights, the proposed method estimates the view fusion weights in a supervised way without extra labeling efforts, resulting in more stable performance.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Multi-View People Detection in Large Scenes via Supervised View-Wise Contribution Weighting

Zhang,

Gong,

Chen

et al. 2024

AAAI

View full text Add to dashboard Cite

Recent deep learning-based multi-view people detection (MVD) methods have shown promising results on existing datasets. However, current methods are mainly trained and evaluated on small, single scenes with a limited number of multi-view frames and fixed camera views. As a result, these methods may not be practical for detecting people in larger, more complex scenes with severe occlusions and camera calibration errors. This paper focuses on improving multi-view people detection by developing a supervised view-wise contribution weighting approach that better fuses multi-camera information under large scenes. Besides, a large synthetic dataset is adopted to enhance the model's generalization ability and enable more practical evaluation and comparison. The model's performance on new testing scenes is further improved with a simple domain adaptation technique. Experimental results demonstrate the effectiveness of our approach in achieving promising cross-scene multi-view people detection performance.

show abstract

Section: Related Workmentioning

confidence: 99%