A Real-Time Deep Network for Crowd Counting

Shi, Xiaowen; Li, Xin; Wu, Caili; Kong, Shuchen; Yang, Jing; He, Liang

doi:10.1109/icassp40776.2020.9053780

Cited by 49 publications

(19 citation statements)

References 21 publications

(25 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The GAME (the lower the better) metric is adopted to evaluate the model performance following the experimental setting 3) Params Hydra-3s [19] 11.0 13.7 16.7 19.3 0.93M MCNN [12,20] 7.5 9.1 11.5 15.9 0.15M AMDCN [21] 9.8 13.3 15.0 15.9 0.33M C-CNN [16] 5.7 8.0 10.8 14.6 0.073M PFANet(Ours) 3.7 5.5 7.6 10.9 0.040M CMS-CNN-3 [22] 7.2 9.7 11.4 13.5 1.03M FCNN-skip [20] 4.6 8.4 11.1 16.1 2.80M CSRNet [5] 3.6 5.6 8.6 15.0 16.26M ADCrowdNet [23] 2.4 4.1 6.8 13.6 26.02M -G stands for GAME.…”

Section: Methodsmentioning

confidence: 99%

“…Four light-weight networks (Hydra-3s [19], MCNN [12,20], AMDCN [21], and C-CNN [16]) and some of previous state-of-the-art large networks (CMS-CNN-3 [22], FCNN-skip [20], CSRNet [5], and ADCrowdNet [23]) are involved in the comparison. The results are reported in the Table 2, where lowest values are in bold in two parts.…”

Section: Methodsmentioning

confidence: 99%

“…We compare our proposed method with several other lightweight networks [12,16,15] in the performance of running time and Floating Point of Operations (FLOPs). All tests are carried out under the same environment in the same laptop with CPU (Intel I5-4210H@2.9Ghz) and GPU (GeForce GTX 960M).…”

Section: Speed Comparisonmentioning

confidence: 99%

See 2 more Smart Citations

Partial Feature Aggregation Network for Real-Time Object Counting

Zhang

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Object counting has become an important task in computer vision for its practical applications in surveillance system. Previous methods for object counting have achieved promising results in accuracy, but few researchers focus on the realtime performance of counting methods. In this paper, we propose an efficient and accurate light-weight network for object counting, called Partial Feature Aggregation Network (PFANet). In this novel method, a Partial Feature Aggregation (PFA) structure is designed to accelerate networks and improve the utilization of multi-scale features. Moreover, PFANet uses the dilated convolution to enlarge the receptivefiled of network. Experiments on two datasets indicate our network exceeds the existing real-time counting networks in both accuracy and efficiency.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Partial Feature Aggregation Network for Real-Time Object Counting

Zhang

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Recently, researchers have adopted deep learning-based methods instead of relying on hand-crafted features to generate high-quality density maps and achieve accurate crowd counting (Cao et al 2018;Shen et al 2018;Wang et al 2020;Shi et al 2020). These approaches can be applied to count different kinds of objects (i.e., vehicles and cells) instead of people (Li, Zhang, and Chen 2018;He et al 2019).…”

Section: Deep Learning-based Approachesmentioning

confidence: 99%

Motion-guided Non-local Spatial-Temporal Network for Video Crowd Counting

Bai¹,

Chan²

2021

Preprint

View full text Add to dashboard Cite

We study video crowd counting, which is to estimate the number of objects (people in this paper) in all the frames of a video sequence. Previous work on crowd counting is mostly on still images. There has been little work on how to properly extract and take advantage of the spatial-temporal correlation between neighboring frames in both short and long ranges to achieve high estimation accuracy for a video sequence. In this work, we propose Monet, a novel and highly accurate motion-guided non-local spatial-temporal network for video crowd counting. Monet first takes people flow (motion information) as guidance to coarsely segment the regions of pixels where a person may be. Given these regions, Monet then uses a nonlocal spatial-temporal network to extract spatial-temporally both short and long-range contextual information. The whole network is finally trained end-to-end with a fused loss to generate a high-quality density map. Noting the scarcity and low quality (in terms of resolution and scene diversity) of the publicly available video crowd datasets, we have collected and built a large-scale video crowd counting datasets, VidCrowd, to contribute to the community. VidCrowd contains 9,000 frames of high resolution (2560 × 1440), with 1,150,239 head annotations captured in different scenes, crowd density and lighting in two cities. We have conducted extensive experiments on the challenging VideoCrowd and two public video crowd counting datasets: UCSD and Mall. Our approach achieves substantially better performance in terms of MAE and MSE as compared with other state-of-the-art approaches.

show abstract

“…In recent times, the crowd counting problem has been addressed by a huge number of methods such as SFANet [1] and SegNet [1], NAS [2], compact [3] convolutional neural network, and HYGNN [4]. The prevalent crowd counting methods can be broadly categorized into: Detection then counting, direct count regression, CNN-based methods, perspective-based methods.…”

Section: Introductionmentioning

confidence: 99%