2019
DOI: 10.3390/rs11131594
|View full text |Cite
|
Sign up to set email alerts
|

A2RMNet: Adaptively Aspect Ratio Multi-Scale Network for Object Detection in Remote Sensing Images

Abstract: Object detection is a significant and challenging problem in the study area of remote sensing and image analysis. However, most existing methods are easy to miss or incorrectly locate objects due to the various sizes and aspect ratios of objects. In this paper, we propose a novel end-to-end Adaptively Aspect Ratio Multi-Scale Network (A 2 RMNet) to solve this problem. On the one hand, we design a multi-scale feature gate fusion network to adaptively integrate the multi-scale features of objects. This ne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
36
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 72 publications
(36 citation statements)
references
References 43 publications
0
36
0
Order By: Relevance
“…Most of proposal-region-based methods [2,22,[41][42][43]47, 50] are based on the "listener" strategy, which first combines language features with visual features of proposal regions, and then select the target region that best matches the input expression from these proposals. The proposal regions are typically extracted by a pretrained object detector (e.g., Faster R-CNN [34], Mask R-CNN [9] and others [4,17,[30][31][32]). To align visual regions with the expression more powerful, methods in [22,47] proposed to decompose the expression into three components (subject, localization and relationship), and leveraged cross-modality attentions to focus on relevant regions.…”
Section: Related Workmentioning
confidence: 99%
“…Most of proposal-region-based methods [2,22,[41][42][43]47, 50] are based on the "listener" strategy, which first combines language features with visual features of proposal regions, and then select the target region that best matches the input expression from these proposals. The proposal regions are typically extracted by a pretrained object detector (e.g., Faster R-CNN [34], Mask R-CNN [9] and others [4,17,[30][31][32]). To align visual regions with the expression more powerful, methods in [22,47] proposed to decompose the expression into three components (subject, localization and relationship), and leveraged cross-modality attentions to focus on relevant regions.…”
Section: Related Workmentioning
confidence: 99%
“…These parameters are usually selected by trial-and-error, which requires expert knowledge and is a time-consuming and uncertain task [12,55]. Although some automatic techniques, like the Taguchi optimization method [55], have been introduced for defining the optimal parameters for the segmentation process, the process of detecting optimal objects is still a challenging task, mostly because of the diversity in the sizes and shapes of the target features [56].…”
Section: Multi-scale Image Segmentationmentioning
confidence: 99%
“…With the rapid development of deep convolutional neural networks (CNNs) [1] in recent years, the conventional object detection methods [2,3] have made some remarkable achievements in natural images. However, due to the huge scale variations of the vast majority of objects and the compact distribution of many small objects in remote sensing images, it still remains a tremendous challenge for locating and predicting the target objects [4,5].…”
Section: Introductionmentioning
confidence: 99%
“…However, the resizing process may result in small objects becoming smaller and more likely to be lost in the deeper layers. For solving this problem, the general solution is to simply cut large-scale images into small chunks [5,16]. However, when the cut images include relatively large objects, such as ground track field, these objects may be broken up into small pieces and make the network hard recognize.…”
Section: Introductionmentioning
confidence: 99%