Adaptive Dilated Network With Self-Correction Supervision for Counting

Bai, Song; He, Zhiqun; Qiao, Yu; Hu, Hanzhe; Wu, Wei; Yan, Junjie

doi:10.1109/cvpr42600.2020.00465

Cited by 157 publications

(62 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, to correct small errors of ground truth caused by the empirically-chosen parameter σ, Wan et al [50], [51] utilized the kernel-based density map to refine the final density map. Bai et al [52] self-corrected the density map by EM algorithm. ZoomCount [28] proposed a zooming mechanism to tackle the underestimation and overestimation issues due to the density variation problem.…”

Section: B Cnn-based Methodsmentioning

confidence: 99%

S$^2$FPR: Crowd Counting via Self-Supervised Coarse to Fine Feature Pyramid Ranking

Gao¹,

Huang²,

Lei³

et al. 2022

Preprint

View full text Add to dashboard Cite

Most conventional crowd counting methods utilize a fully-supervised learning framework to learn a mapping between scene images and crowd density maps. Under the circumstances of such fully-supervised training settings, a large quantity of expensive and time-consuming pixel-level annotations are required to generate density maps as the supervision. One way to reduce costly labeling is to exploit self-structural information and innerrelations among unlabeled images. Unlike the previous methods utilizing these relations and structural information from the original image level, we explore such self-relations from the latent feature spaces because it can extract more abundant relations and structural information. Specifically, we propose S 2 FPR which can extract structural information and learn partial orders of coarse-to-fine pyramid features in the latent space for better crowd counting with massive unlabeled images. In addition, we collect a new unlabeled crowd counting dataset (FUDAN-UCC) with 4,000 images in total for training. One by-product is that our proposed S 2 FPR method can leverage numerous partial orders in the latent space among unlabeled images to strengthen the model representation capability and reduce the estimation errors for the crowd counting task. Extensive experiments on four benchmark datasets, i.e. the UCF-QNRF, the ShanghaiTech PartA and PartB, and the UCF-CC-50, show the effectiveness of our method compared with previous semisupervised methods. The source code and dataset are available at https://github.com/bridgeqiqi/S2FPR.

show abstract

Section: B Cnn-based Methodsmentioning

confidence: 99%

S$^2$FPR: Crowd Counting via Self-Supervised Coarse to Fine Feature Pyramid Ranking

Gao¹,

Huang²,

Lei³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Sam et al [ 36 ] proposed locating each person in a dense crowd using a bounding box to size the identified heads and then counting them. Another study proposed an adaptive dilated convolution that can learn a continuous hole rate at different positions in the image to effectively match changes in the scale at different positions [ 37 ]. PACNN [ 38 ] framework eliminates the need for a density regression paradigm.…”

Section: Related Workmentioning

confidence: 99%

Multiscale Aggregate Networks with Dense Connections for Crowd Counting

Zhang

Wan

et al. 2021

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

The most advanced method for crowd counting uses a fully convolutional network that extracts image features and then generates a crowd density map. However, this process often encounters multiscale and contextual loss problems. To address these problems, we propose a multiscale aggregation network (MANet) that includes a feature extraction encoder (FEE) and a density map decoder (DMD). The FEE uses a cascaded scale pyramid network to extract multiscale features and obtains contextual features through dense connections. The DMD uses deconvolution and fusion operations to generate features containing detailed information. These features can be further converted into high-quality density maps to accurately calculate the number of people in a crowd. An empirical comparison using four mainstream datasets (ShanghaiTech, WorldExpo’10, UCF_CC_50, and SmartCity) shows that the proposed method is more effective in terms of the mean absolute error and mean squared error. The source code is available at https://github.com/lpfworld/MANet.

show abstract

“…The state-of-the-art crowd counting methods are mostly concentrated on density map estimation in recent years, which integrates the density map as a count value. CNNbased methods [18], [19], [20], [21] show its powerful capacity of feature extraction than hand-crafted features models [22], [23]. Some methods [24], [25], [26], [27], [28] work on network architectures or specific modules to regress pixelwise or patch-wise density maps.…”

Section: Crowd Countingmentioning

confidence: 99%

LDC-Net: A Unified Framework for Localization, Detection and Counting in Dense Crowds

wang,

Han,

Gao

et al. 2021

Preprint

View full text Add to dashboard Cite

The rapid development in visual crowd analysis shows a trend to count people by positioning or even detecting, rather than simply summing a density map. It also enlightens us back to the essence of the field, detection to count, which can give more abundant crowd information and has more practical applications. However, some recent work on crowd localization and detection has two limitations: 1) The typical detection methods can not handle the dense crowds and a large variation in scale; 2) The density map heuristic methods suffer from performance deficiency in position and box prediction, especially in high density or large-size crowds. In this paper, we devise a tailored baseline for dense crowds location, detection, and counting from a new perspective, named as LDC-Net for convenience, which has the following features: 1) A strong but minimalist paradigm to detect objects by only predicting a location map and a size map, which endows an ability to detect in a scene with any capacity (0 ∼ 10, 000+ persons); 2) Excellent cross-scale ability in facing a large variation, such as the head ranging in 0 ∼ 100, 000+ pixels; 3) Achieve superior performance in location and box prediction tasks, as well as a competitive counting performance compared with the density-based methods. Finally, the source code and pre-trained models will be released.

show abstract

Adaptive Dilated Network With Self-Correction Supervision for Counting

Cited by 157 publications

References 41 publications

S$^2$FPR: Crowd Counting via Self-Supervised Coarse to Fine Feature Pyramid Ranking

S$^2$FPR: Crowd Counting via Self-Supervised Coarse to Fine Feature Pyramid Ranking

Multiscale Aggregate Networks with Dense Connections for Crowd Counting

LDC-Net: A Unified Framework for Localization, Detection and Counting in Dense Crowds

Contact Info

Product

Resources

About