2021
DOI: 10.48550/arxiv.2112.15509
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Scene-Adaptive Attention Network for Crowd Counting

Abstract: In recent years, significant progress has been made on the research of crowd counting. However, as the challenging scale variations and complex scenes existed in crowds, neither traditional convolution networks nor near recent Transformer architectures with fixed-size attention could handle the task well. To address this problem, this paper proposes a sceneadaptive attention network, termed SAANet. First of all, we design a deformable attention in-built Transformer backbone, which learns adaptive feature repre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 40 publications
0
4
0
Order By: Relevance
“…BCCTrans [28] introduces a global context learnable token to guide the counting. SAANet [40] designs a deformer backbone to extract the features, aggregates multi-level features by a deformable transformer encoder, and introduces a count query in a transformer decoder to re-calibrates the multi-level feature maps. DCSwinTrans [10] enhances the large-range contextual information by a dilated Swin Transformer backbone, and equips with a feature pyramid networks decoder to achieve crowd instant localization.…”
Section: Transformer Based Crowd Countingmentioning
confidence: 99%
See 1 more Smart Citation
“…BCCTrans [28] introduces a global context learnable token to guide the counting. SAANet [40] designs a deformer backbone to extract the features, aggregates multi-level features by a deformable transformer encoder, and introduces a count query in a transformer decoder to re-calibrates the multi-level feature maps. DCSwinTrans [10] enhances the large-range contextual information by a dilated Swin Transformer backbone, and equips with a feature pyramid networks decoder to achieve crowd instant localization.…”
Section: Transformer Based Crowd Countingmentioning
confidence: 99%
“…It is widely studied by the academia and industrial communities since the number of persons is an important indicator of incident monitoring[31], traffic control [19], and infectious disease prevention [32]. The existing crowd counting methods have achieved tremendous improvement due to the introduce of convolutional neural networks [7,8] and transformer [28,40].However, when light is insufficient, the performance of crowd counting is unsatisfying, as shown in the first line of Fig. 1.…”
mentioning
confidence: 99%
“…To alleviate the problem of difficult collection and annotation of crowd counting datasets, some works [15], [16], [18] explore the domain-adaptive crowd counting from the synthetic datasets to the real-world. In addition, with the Vision Transformer (ViT) [3] first applying the transformer structure for vision tasks, many transformer-based [25] crowd counting methods [43]- [45] have been proposed with outstanding performance.…”
Section: A Rgb-based Crowd Countingmentioning
confidence: 99%
“…C ROWD analysis is a popular application of computer vision and has achieved superb success, especially in crowd counting [10], [46], [59]. Crowd counting is a fundamental task, which estimates the sum counts of instances in crowd scenes.…”
Section: Introductionmentioning
confidence: 99%