2018
DOI: 10.1007/s11263-018-1101-7
|View full text |Cite
|
Sign up to set email alerts
|

Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection

Abstract: In this paper, we propose a zoom-out-and-in network for generating object proposals. A key observation is that it is difficult to classify anchors of different sizes with the same set of features. Anchors of different sizes should be placed accordingly based on different depth within a network: smaller boxes on high-resolution layers with a smaller stride while larger boxes on low-resolution counterparts with a larger stride. Inspired by the conv/deconv structure, we fully leverage the low-level local details … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
44
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
4
1

Relationship

3
7

Authors

Journals

citations
Cited by 93 publications
(44 citation statements)
references
References 50 publications
0
44
0
Order By: Relevance
“…Different from Kong et al [74] and Li et al [75], Multi-Level FPN [76] stacks one highest and one lower level feature layers and recursively outputs a set of pyramidal features, which are all finally combined into a single feature pyramid in a scale-wise manner (see Figure 10(g)). Feature fusion module (FFM) v1 equals the dimensions of the input feature maps by a sequence of 1 × 1 convolution and upsampling operations.…”
Section: Methods Using Backbone Features As a Basismentioning
confidence: 99%
“…Different from Kong et al [74] and Li et al [75], Multi-Level FPN [76] stacks one highest and one lower level feature layers and recursively outputs a set of pyramidal features, which are all finally combined into a single feature pyramid in a scale-wise manner (see Figure 10(g)). Feature fusion module (FFM) v1 equals the dimensions of the input feature maps by a sequence of 1 × 1 convolution and upsampling operations.…”
Section: Methods Using Backbone Features As a Basismentioning
confidence: 99%
“…This approach has been found to be effective for segmentation [177,241] and human pose estimation [194], has been widely exploited by both one-stage and two-stage detectors to alleviate problems of scale variation across object instances. Representative methods include SharpMask [214], Deconvolutional Single Shot Detector (DSSD) [77], Feature Pyramid Network (FPN) [167], Top Down Modulation (TDM) [247], Reverse connection with Objectness prior Network (RON) [136], ZIP [156], Scale Transfer Detection Network (STDN) [321], RefineDet [308], StairNet [283], Path Aggregation Network (PANet) [174], Feature Pyramid Reconfiguration (FPR) [137], DetNet [164], Scale Aware Network (SAN) [133], Multiscale Location aware Kernel Representation (MLKP) [278] and M2Det [315], as shown in Table 7 and contrasted in Fig. 17.…”
Section: Stdn [321]mentioning
confidence: 99%
“…Several recent attempts have been made using attention mechanisms to increase the capabilities of CNNs in various vision tasks, including classification [19], detection [20], segmentation [21], image captioning [22], and visual question answering [23]. Attention mechanisms guide the model to emphasize the most salient features, avoiding useless features that are beneficial for specific tasks.…”
Section: Attention and Gating Mechanismmentioning
confidence: 99%