2023
DOI: 10.1145/3513133
|View full text |Cite
|
Sign up to set email alerts
|

Towards Accurate Oriented Object Detection in Aerial Images with Adaptive Multi-level Feature Fusion

Abstract: Detecting objects in aerial images is a long-standing and challenging problem since the objects in aerial images vary dramatically in size and orientation. Most existing neural network based methods are not robust enough to provide accurate oriented object detection results in aerial images since they do not consider the correlations between different levels and scales of features. In this paper, we propose a novel two-stage network-based detector with a daptive f … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(5 citation statements)
references
References 36 publications
0
5
0
Order By: Relevance
“…Oriented R-CNN [26] R50 80.87 R3Det-GWD [44] R152 80.19 R3Det-KLD [45] R152 80.63 AFF-Det [41] R50 80.73 KFIoU [46] Swin-T 80.93 RVSA [47] ViT-B 81.01 S 2 ANet [1] R50 79.42 ReDet [38] Re-R50 80.10 AOPG [39] R50 80.66 R3Det [2] R152 76.47 G-Rep [32] Swin-T 80.16…”
Section: Methods Backbone Mapmentioning
confidence: 99%
“…Oriented R-CNN [26] R50 80.87 R3Det-GWD [44] R152 80.19 R3Det-KLD [45] R152 80.63 AFF-Det [41] R50 80.73 KFIoU [46] Swin-T 80.93 RVSA [47] ViT-B 81.01 S 2 ANet [1] R50 79.42 ReDet [38] Re-R50 80.10 AOPG [39] R50 80.66 R3Det [2] R152 76.47 G-Rep [32] Swin-T 80.16…”
Section: Methods Backbone Mapmentioning
confidence: 99%
“…CGNet [ 39 ] uses self-attention to enhance communication between pyramid levels. AFF-Det [ 14 , 18 ] maps ROIs to all levels and applies a unified supervisory signal to alleviate the semantic gap. The authors of [ 16 , 17 ] perform feature refinement after scaling.…”
Section: Methodsmentioning
confidence: 99%
“…The mainstream method [ 9 ] conveys the semantic information layer-by-layer using lateral connections. Other improved works [ 13 , 14 , 15 , 16 , 17 , 18 ] apply semantic reconfiguration to single-scale aggregated features resampled from all pyramid levels. However, we find that these methods have certain shortcomings: First, fusing all pyramid features with different receptive fields into a single scale introduces a lot of irrelevant semantic information.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Different from natural images, UAVs usually capture aerial images under varying illumination and uncontrolled outdoor conditions, which requires object detection models with strong robustness 20 , 21 . Existing methods are mainly carried out in terms of model structure and labeled data 22 .…”
Section: Related Workmentioning
confidence: 99%