2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00109
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting

Abstract: Crowd counting presents enormous challenges in the form of large variation in scales within images and across the dataset. These issues are further exacerbated in highly congested scenes. Approaches based on straightforward fusion of multi-scale features from a deep network seem to be obvious solutions to this problem. However, these fusion approaches do not yield significant improvements in the case of crowd counting in congested scenes. This is usually due to their limited abilities in effectively combining … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
86
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 182 publications
(87 citation statements)
references
References 73 publications
1
86
0
Order By: Relevance
“…On the other hand, since high-level semantic information is not adequate for recovering detailed spatial information, feature maps from shallower layers, which have smaller semantic levels, are required to encode low-level details and spatial information to refine the coarse high-level semantic features for accurate spatial localization [53]- [55].…”
Section: Pyramidal Feature Hierarchymentioning
confidence: 99%
“…On the other hand, since high-level semantic information is not adequate for recovering detailed spatial information, feature maps from shallower layers, which have smaller semantic levels, are required to encode low-level details and spatial information to refine the coarse high-level semantic features for accurate spatial localization [53]- [55].…”
Section: Pyramidal Feature Hierarchymentioning
confidence: 99%
“…Sam et al [3] improved MCNN and propose a switchable module to classify the crowd density of each patch and assign it to corresponding regressor. Sindagi et al [12] proposed a top-down and bottom-up multi-level fusion mechanism to fuse features for crowd counting. CSRNet [21] stacks dilated convolutions after VGGNet [40].…”
Section: ) Regression Based Crowd Countingmentioning
confidence: 99%
“…Our method follows density map regression methods. Previous density regression based works [21], [41], [12], [39] usually first extract image/patch features using a backbone network(e.g. VGG16 [40]), and then perform density regression.…”
Section: ) Regression Based Crowd Countingmentioning
confidence: 99%
See 2 more Smart Citations