2019
DOI: 10.48550/arxiv.1908.00328
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ScarfNet: Multi-scale Features with Deeply Fused and Redistributed Semantics for Enhanced Object Detection

Abstract: Convolutional neural network (CNN) has led significant progress in object detection. In order to detect the objects in various sizes, the object detectors often exploit the hierarchy of the multi-scale feature maps called feature pyramid, which is readily obtained by the CNN architecture. However, the performance of these object detectors is limited since the bottom-level feature maps, which experience fewer convolutional layers, lack the semantic information needed to capture the characteristics of the small … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 20 publications
0
2
0
Order By: Relevance
“…Libra-RCNN [39] fuses the 5-level feature maps from FPN [42] and uses a Gaussian Non-Local [43] attention to obtain the balance semantic features. ScarfNet [40] generates features with strong semantic attention for each pyramid scale by bidirectional long short term memory (biLSTM) [44] and channel-wise attention. DES [41], built upon SSD [45], adds an extra semantic attention branch supervised with weak segmentation ground-truth for semantic enrichment.…”
Section: B Attention Mechanismmentioning
confidence: 99%
See 1 more Smart Citation
“…Libra-RCNN [39] fuses the 5-level feature maps from FPN [42] and uses a Gaussian Non-Local [43] attention to obtain the balance semantic features. ScarfNet [40] generates features with strong semantic attention for each pyramid scale by bidirectional long short term memory (biLSTM) [44] and channel-wise attention. DES [41], built upon SSD [45], adds an extra semantic attention branch supervised with weak segmentation ground-truth for semantic enrichment.…”
Section: B Attention Mechanismmentioning
confidence: 99%
“…Yang et al [46] proposed a Multi-Dimensional Attention Network to strengthen the response of the region of interest. In contrast to [10], [37]- [40], our proposed method adds accurate supervision to guide the learning of the attention mechanism. Besides, our attention mechanism has a simple structure compared to the biLSTM in ScarfNet [40] and the atrous convolution in DES [41].…”
Section: B Attention Mechanismmentioning
confidence: 99%