2022
DOI: 10.1609/aaai.v36i1.19893
|View full text |Cite
|
Sign up to set email alerts
|

CF-DETR: Coarse-to-Fine Transformers for End-to-End Object Detection

Abstract: The recently proposed DEtection TRansformer (DETR) achieves promising performance for end-to-end object detection. However, it has relatively lower detection performance on small objects and suffers from slow convergence. This paper observed that DETR performs surprisingly well even on small objects when measuring Average Precision (AP) at decreased Intersection-over-Union (IoU) thresholds. Motivated by this observation, we propose a simple way to improve DETR by refining the coarse features and predicted loca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(7 citation statements)
references
References 26 publications
0
7
0
Order By: Relevance
“…POTO [59] propose to assign the anchor with either the maximum IoU or closest to the object center as the positive sample, which is modified from the strategies of RetinaNet [32] or FCOS. DETR [5] and its followups [44,4,62,81,34,22] apply the Hungarian matching to compute one-to-one positive assignments based on the global minimum matching cost values between all predictions and the ground-truth boxes. Different from the most related work POTO [59] that only uses one-to-many assignment, based on ATSS [76], to help the classification branch of FCOS [56], our approach chooses Hungarian matching to perform both one-to-one matching and one-to-many matching following DETR and generalizes to various vision tasks.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…POTO [59] propose to assign the anchor with either the maximum IoU or closest to the object center as the positive sample, which is modified from the strategies of RetinaNet [32] or FCOS. DETR [5] and its followups [44,4,62,81,34,22] apply the Hungarian matching to compute one-to-one positive assignments based on the global minimum matching cost values between all predictions and the ground-truth boxes. Different from the most related work POTO [59] that only uses one-to-many assignment, based on ATSS [76], to help the classification branch of FCOS [56], our approach chooses Hungarian matching to perform both one-to-one matching and one-to-many matching following DETR and generalizes to various vision tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Motivated by the success of DETR on a wide variety of vision tasks, many follow-up efforts have improved DETR from various aspects, including redesigning more advanced transformer encoder [81, 13,14] or transformer decoder architectures [44,73,81,4] or query formulations [62,34,22,74]. Different from most of these previous efforts, we focus on the inefficient training issues caused by one-to-one matching, which only assigns one query to each ground truth.…”
Section: Introductionmentioning
confidence: 99%
“…The authors in [46] proposed the concept of Row-Column Decoupled Attention (RCDA), decomposing the 2D attention of key features into two simpler forms: 1D row-wise and column-wise attentions. In the case of CF-DETR [47], an alternative approach to FPN was proposed whereby C5 features were replaced with encoder features at level 5 (E5), resulting in improved object presentation. This innovation was named Transformer Enhanced FPN (TEF) module.…”
Section: Fast Attention For High-resolution or Multi-scale Feature Mapsmentioning
confidence: 99%
“…After that, the Transformer decoder takes the encoded image features F and a small set of object queries Q ∈ R N ×d as input, and then produces the detection results. Here, N denotes the number of object queries, which is typically set to 100∼300 [5,9,10,11,64,12,55,13,65,66,67,68].…”
Section: A Brief Review Of Detrmentioning
confidence: 99%