2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.38
|View full text |Cite
|
Sign up to set email alerts
|

Deep Occlusion Reasoning for Multi-camera Multi-target Detection

Abstract: People detection in single 2D images has improved greatly in recent years. However, comparatively little of this progress has percolated into multi-camera multipeople tracking algorithms, whose performance still degrades severely when scenes become very crowded. In this work, we introduce a new architecture that combines Convolutional Neural Nets and Conditional Random Fields to explicitly model those ambiguities. One of its key ingredients are high-order CRF terms that model potential occlusions and give our … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
84
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 95 publications
(84 citation statements)
references
References 28 publications
0
84
0
Order By: Relevance
“…If at all, depth order in NVS has been handled through depth maps [63] and by introducing a discrete number of equally spaced depth layers [14], but none of these address the inherent scale ambiguity as done here with Bi-NVS. Closely related is the unsupervised person detection and segmentation method proposed in [2], which localizes and matches persons across views through a grid of candidate positions on the ground plane.…”
Section: Nsd With Multiple Subjectsmentioning
confidence: 99%
“…If at all, depth order in NVS has been handled through depth maps [63] and by introducing a discrete number of equally spaced depth layers [14], but none of these address the inherent scale ambiguity as done here with Bi-NVS. Closely related is the unsupervised person detection and segmentation method proposed in [2], which localizes and matches persons across views through a grid of candidate positions on the ground plane.…”
Section: Nsd With Multiple Subjectsmentioning
confidence: 99%
“…In [19], Baqué and Fleuret use such feature map as the input to train a Conditional Random Field (CRF) which can explicitly model the occlusions between each pedestrian on the ground plane. In [20], Chavdarova also utilizes the feature map generated by the deep CNN.…”
Section: Related Workmentioning
confidence: 99%
“…In order to estimate the ground-plane occupancy vector, some of the multi-camera object detection systems extract binary foreground mask as the feature, which is not robust in severely-occluded traffic scenes [15], [16], [18]. Some other algorithms use features that are generated by a deep Convolutional Neural Network (CNN) [19], [20]. The existing approaches fuse the extracted features to infer the occupancy vector.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Furthermore, we evaluate our person detector performance on the PETS S2L1 dataset which is one of the videos available for training in 3D MOT 2015 benchmark [35] and compare it against three recent baseline methods. It should be noted that POM-CNN [5] utilises a CNN-based foreground segmentation process while DOR [5] has coupled the CNN with conditional random fields specifically to handle occlusions and imitate the generative-discriminative training approach of GANs. Still the proposed approach outperforms the state-of-the-art methods in all the considered metrics.…”
Section: Person Detector Evaluationmentioning
confidence: 99%