2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01146
|View full text |Cite
|
Sign up to set email alerts
|

Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
417
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 428 publications
(419 citation statements)
references
References 24 publications
1
417
0
1
Order By: Relevance
“…For fair comparison between PVTv2 and Swin Transformer [21], we keep all settings the same, including ImageNet-1K pre-training and COCO fine-tuning strategies. We evaluate Swin Transformer and PVTv2 on four state-of-the-arts detectors, including Cascade R-CNN [1], ATSS [37], GFL [17], and Sparse R-CNN [26]. We see PVTv2 obtain much better AP than Swin Transformer among all the detectors, showing its better feature representation ability.…”
Section: Object Detectionmentioning
confidence: 94%
See 2 more Smart Citations
“…For fair comparison between PVTv2 and Swin Transformer [21], we keep all settings the same, including ImageNet-1K pre-training and COCO fine-tuning strategies. We evaluate Swin Transformer and PVTv2 on four state-of-the-arts detectors, including Cascade R-CNN [1], ATSS [37], GFL [17], and Sparse R-CNN [26]. We see PVTv2 obtain much better AP than Swin Transformer among all the detectors, showing its better feature representation ability.…”
Section: Object Detectionmentioning
confidence: 94%
“…All models are trained on COCO train2017 (118k images) and evaluated on val2017 (5k images). We verify the effectiveness of PVTv2 backbones on top of mainstream detectors, including RetinaNet [19], Mask R-CNN [11], Cascade Mask R-CNN [1], ATSS [37], GFL [17], and Sparse R-CNN [26].…”
Section: Object Detectionmentioning
confidence: 99%
See 1 more Smart Citation
“…DETR [5] adopts a three-layer perceptron to predict object box coordinates. However, as pointed by GFLoss [28], directly regressing the coordinates is equivalent to fitting a Dirac delta distribution, which fails to consider the ambiguity and uncertainty in the datasets. This representation is not flexible and not robust to challenges such as occlusion and cluttered background in object tracking.…”
Section: A Simple Baseline Based On Transformermentioning
confidence: 99%
“…The lowest evaluation scheme layer is an alternative scheme for decision-making, where seven test algorithms are selected, Faster RCNN [38], RetinaNet [39], ATSS [40], FoveaBox [41], GFocal Loss [42], PAFPN [43], and RepPoints [44], representing seven schemes. The middle layer is the criterion that needs to be considered in the evaluation process, namely, four evaluation indexes: illumination index, environment index, scale index, and angle index.…”
Section: Construction Of Hierarchical Structurementioning
confidence: 99%