ADCrowdNet: An Attention-injective Deformable Convolutional Network for Crowd Understanding

Liu, Ning; Long, Yongchao; Zhang, Changqing; Niu, Qun; Pan, Li; Wu, Hefeng

doi:10.48550/arxiv.1811.11968

Cited by 11 publications

(8 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most recently, several methods have focused on incorporating additional cues such as segmentation and semantic priors [61,75], attention [31,54,58], perspective [50], context information respectively [33], multiple-views [70] and multi-scale features [20] into the network. Wang et al [63] introduced a new synthetic dataset and proposed a SSIM based CycleGAN [78] to adapt the synthetic datasets to real world dataset.…”

Section: Related Workmentioning

confidence: 99%

Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting

Sindagi

Patel

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

182

View full text Add to dashboard Cite

Crowd counting presents enormous challenges in the form of large variation in scales within images and across the dataset. These issues are further exacerbated in highly congested scenes. Approaches based on straightforward fusion of multi-scale features from a deep network seem to be obvious solutions to this problem. However, these fusion approaches do not yield significant improvements in the case of crowd counting in congested scenes. This is usually due to their limited abilities in effectively combining the multi-scale features for problems like crowd counting. To overcome this, we focus on how to efficiently leverage information present in different layers of the network. Specifically, we present a network that involves: (i) a multilevel bottom-top and top-bottom fusion (MBTTBF) method to combine information from shallower to deeper layers and vice versa at multiple levels, (ii) scale complementary feature extraction blocks (SCFB) involving cross-scale residual functions to explicitly enable flow of complementary features from adjacent conv layers along the fusion paths. Furthermore, in order to increase the effectiveness of the multi-scale fusion, we employ a principled way of generating scale-aware ground-truth density maps for training. Experiments conducted on three datasets that contain highly congested scenes (ShanghaiTech, UCF CROWD 50, and UCF-QNRF) demonstrate that the proposed method is able to outperform several recent methods in all the datasets.

show abstract

Section: Related Workmentioning

confidence: 99%

Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting

Sindagi

Patel

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

182

View full text Add to dashboard Cite

show abstract

“…Recent approaches like [31,59,60,61,62,63] have aimed at incorporating various forms of related information like attention [59], semantic priors [60], segmentation [61], inverse attention [62], and hierarchical attention [31] respectively into the network. Other techniques such as [64,65,66,67,68] leverage features from different layers of the network using different techniques like trellis style encoder decoder [64], explicitly considering perspective [65], context information [66], adaptive density map generation [68] and multiple views [67].…”

Section: Related Workmentioning

confidence: 99%

JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method

Sindagi

Yasarla

Patel

2020

IEEE Trans. Pattern Anal. Mach. Intell.

167

View full text Add to dashboard Cite

Due to its variety of applications in the real-world, the task of single image-based crowd counting has received a lot of interest in the recent years. Recently, several approaches have been proposed to address various problems encountered in crowd counting. These approaches are essentially based on convolutional neural networks that require large amounts of data to train the network parameters. Considering this, we introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations. In comparison to existing datasets, the proposed dataset is collected under a variety of diverse scenarios and environmental conditions. Specifically, the dataset includes several images with weather-based degradations and illumination variations, making it a very challenging dataset. Additionally, the dataset consists of a rich set of annotations at both image-level and head-level. Several recent methods are evaluated and compared on this dataset. The dataset can be downloaded from http://www.crowdcounting.com.Furthermore, we propose a novel crowd counting network that progressively generates crowd density maps via residual error estimation. The proposed method uses VGG16 as the backbone network and employs density map generated by the final layer as a coarse prediction to refine and generate finer density maps in a progressive fashion using residual learning. Additionally, the residual learning is guided by an uncertaintybased confidence weighting mechanism that permits the flow of only high-confidence residuals in the refinement path. The proposed Confidence Guided Deep Residual Counting Network (CG-DRCN) is evaluated on recent complex datasets, and it achieves significant improvements in errors.

show abstract

“…Crowd understanding, or crowd analysis, a topic related to group detection, is also an active research field. Ning et alproposed an attention-injective deformable convolutional network called ADCrowdNet, which could address the accuracy degradation problem of highly congested noisy scenes [20]. Yuting et aldeveloped a network that can handle both detection and crowded counting without annotation with bounding boxes [22].…”

Section: Group Detectionmentioning

confidence: 99%

Birds Eye View Social Distancing Analysis System

Yang¹,

Sun²,

Ye³

et al. 2021

Preprint

View full text Add to dashboard Cite

Social distancing can reduce the infection rates in respiratory pandemics such as COVID-19. Traffic intersections are particularly suitable for monitoring and evaluation of social distancing behavior in metropolises. We propose and evaluate a privacy-preserving social distancing analysis system (B-SDA), which uses bird's-eye view video recordings of pedestrians who cross traffic intersections. We devise algorithms for video preprocessing, object detection and tracking which are rooted in the known computer-vision and deep learning techniques, but modified to address the problem of detecting very small objects/pedestrians captured by a highly elevated camera. We propose a method for incorporating pedestrian grouping for detection of social distancing violations. B-SDA is used to compare pedestrian behavior based on pre-pandemic and pandemic videos in a major metropolitan area. The accomplished pedestrian detection performance is 63.0% AP50 and the tracking performance is 47.6% MOTA. The social distancing violation rate of 15.6% during the pandemic is notably lower than 31.4% pre-pandemic baseline, indicating that pedestrians followed CDC-prescribed social distancing recommendations. The proposed system is suitable for deployment in real-world applications.

show abstract

ADCrowdNet: An Attention-injective Deformable Convolutional Network for Crowd Understanding

Cited by 11 publications

References 33 publications

Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting

Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting

JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method

Birds Eye View Social Distancing Analysis System

Contact Info

Product

Resources

About