MFIL-FCOS: A Multi-Scale Fusion and Interactive Learning Method for 2D Object Detection and Remote Sensing Image Detection

Zhang, Guoqing; Yu, Wenyu; Hou, Ruixia

doi:10.3390/rs16060936

Cited by 8 publications

(1 citation statement)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Object detection has consistently been a popular and crucial task in computer vision, aiming to achieve high accuracy in recognizing various objects within different images. There are two approaches for object detection: one-stage methods [12,13] and two-stage methods [14,15]. The Faster R-CNN [14] with Feature Pyramid Network [16] is widely used as a two-stage method in common object detection.…”

Section: Remote Sensing Object Detectionmentioning

confidence: 99%

HVConv: Horizontal and Vertical Convolution for Remote Sensing Object Detection

Chen,

Lin,

Huang

et al. 2024

Remote Sensing

View full text Add to dashboard Cite

Generally, the interesting objects in aerial images are completely different from objects in nature, and the remote sensing objects in particular tend to be more distinctive in aspect ratio. The existing convolutional networks have equal aspect ratios of the receptive fields, which leads to receptive fields either containing non-relevant information or being unable to fully cover the entire object. To this end, we propose Horizontal and Vertical Convolution, which is a plug-and-play module to address different aspect ratio problems. In our method, we introduce horizontal convolution and vertical convolution to expand the receptive fields in the horizontal and vertical directions, respectively, to reduce redundant receptive fields, so that remote sensing objects with different aspect ratios can achieve better receptive fields coverage, thereby achieving more accurate feature representation. In addition, we design an attention module to dynamically aggregate these two sub-modules to achieve more accurate feature coverage. Extensive experimental results on the DOTA and HRSC2016 datasets show that our HVConv achieves accuracy improvements in diverse detection architectures and obtains SOTA accuracy (mAP score of 77.60% with DOTA single-scale training and mAP score of 81.07% with DOTA multi-scale training). Various ablation studies were conducted as well, which is enough to verify the effectiveness of our model.

show abstract