2023
DOI: 10.1109/tcsvt.2022.3208714
|View full text |Cite
|
Sign up to set email alerts
|

Counting Varying Density Crowds Through Density Guided Adaptive Selection CNN and Transformer Estimation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 27 publications
(3 citation statements)
references
References 49 publications
0
3
0
Order By: Relevance
“…Tian et al [17] proposed a pyramid feature aggregation network, which fuses global context features extracted by Transformer with local features to generate high-quality density maps. Song et al [18] designed an aggregated counting network that combines a point-based localization method to improve the network's sensitivity to crowd location; Chen et al [19] designed a new region selection mechanism to classify image regions as high/low density based on combining the output features of the Transformer and the convolutional layers to improve the counting accuracy; Wang et al [20] proposed a spatial context learning network that Convolutional layers with different convolutional kernel sizes are utilized to extract features at the corresponding scales in order to obtain the spatial information required for counting.…”
Section: Counting Methods Based On Density Mapsmentioning
confidence: 99%
“…Tian et al [17] proposed a pyramid feature aggregation network, which fuses global context features extracted by Transformer with local features to generate high-quality density maps. Song et al [18] designed an aggregated counting network that combines a point-based localization method to improve the network's sensitivity to crowd location; Chen et al [19] designed a new region selection mechanism to classify image regions as high/low density based on combining the output features of the Transformer and the convolutional layers to improve the counting accuracy; Wang et al [20] proposed a spatial context learning network that Convolutional layers with different convolutional kernel sizes are utilized to extract features at the corresponding scales in order to obtain the spatial information required for counting.…”
Section: Counting Methods Based On Density Mapsmentioning
confidence: 99%
“…SANet (Cao et al 2018) introduced the SSIM ( Wang et al 2004) loss function, which optimizes the similarity between the predicted density map and the ground truth density map based on three indicators: brightness, contrast, and structure. CTASNet (Chen et al 2022) used optimal transport loss and total variation loss to reduce the impact of annotation noise. In this work, we also construct a comprehensive loss function to enhance the accuracy and robustness of the model.…”
Section: Crowd Countingmentioning
confidence: 99%
“…These advantages partially compensate for the limitations of CNNs, enabling the Transformer to better comprehend the relationships between semantic elements in different regions of the entire image. However, Chen et al (2022) found that a pure Transformer is less reliable in sparse crowd regions, and it is difficult to accurately localize and count the target crowds. This is primarily attributed to the self-attention mechanism of Transformers, which excels at capturing global relationships but lacks direct modeling of local receptive fields.…”
Section: Introductionmentioning
confidence: 99%