2022
DOI: 10.1007/s11432-021-3445-y
|View full text |Cite
|
Sign up to set email alerts
|

TransCrowd: weakly-supervised crowd counting with transformers

Abstract: The mainstream crowd counting methods usually utilize the convolution neural network (CNN) to regress a density map, requiring point-level annotations. However, annotating each person with a point is an expensive and laborious process. During the testing phase, the point-level annotations are not considered to evaluate the counting accuracy, which means the point-level annotations are redundant. Hence, it is desirable to develop weakly-supervised counting methods that just rely on count-level annotations, a mo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 148 publications
(51 citation statements)
references
References 60 publications
0
29
0
Order By: Relevance
“…Transformers have an inherent advantage in weakly-supervised crowd counting, since they can enhance global information about features and capture contextual knowledge. TransCrowd [12] was the first transformer-based crowd counting framework, which reformulates the counting problem from a sequential perspective to a counting perspective. CCTrans [31] is applicable to both fully-supervised and weakly-supervised data, and uses Twins [32] as a feature extraction framework.…”
Section: Weakly-supervised Crowd Countingmentioning
confidence: 99%
See 1 more Smart Citation
“…Transformers have an inherent advantage in weakly-supervised crowd counting, since they can enhance global information about features and capture contextual knowledge. TransCrowd [12] was the first transformer-based crowd counting framework, which reformulates the counting problem from a sequential perspective to a counting perspective. CCTrans [31] is applicable to both fully-supervised and weakly-supervised data, and uses Twins [32] as a feature extraction framework.…”
Section: Weakly-supervised Crowd Countingmentioning
confidence: 99%
“…A CNN is limited to extracting a global receptive field without using a density map due to the characteristics of local feature extraction. In 2021, a transformer was introduced to the weaklysupervised crowd counting task [12]. The global attention of the corresponding network can effectively overcome the limited receptive field of CNN-based methods.…”
Section: Introductionmentioning
confidence: 99%
“…In addition, in weakly-supervised crowd counting, there are some other transformer based methods. TransCrowd [17] uses a learnable counting token or global average pooling on high-layer semantic tokens to represent the crowd numbers. It constructs a weakly supervised model from sequence-to-count perspective.…”
Section: Transformer Based Crowd Countingmentioning
confidence: 99%
“…Inspired by the recent prominence of the transformer and its success in many CV problems such as image classification [26,8,27,10,[28][29][30], object detection [31][32][33], segmentation [27,34,35], crowd counting [36,37] and image restoration [38][39][40], Liang et al [12] proposed a new state-of-the-art image restoration model based on the Swin transformer [10]. The SwinIR model consists yet again of three modules: a shallow feature extractor, a transformer-based deep feature extractor and a high-quality image reconstruction module.…”
Section: Swinir Image Restorationmentioning
confidence: 99%