2015
DOI: 10.1109/tpami.2015.2389824
|View full text |Cite
|
Sign up to set email alerts
|

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Abstract: Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224 × 224) input image. This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, "spatial pyramid pooling", to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

6
5,078
0
24

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 9,651 publications
(5,108 citation statements)
references
References 21 publications
6
5,078
0
24
Order By: Relevance
“…These algorithms are mainly classified into two groups: one is object detection method based on region proposal [58][59][60], which is a mainstream algorithm, e.g., RCNN [31], SPPNet [61], Fast-RCNN [34], Faster-RCNN [62], and MSRA recently proposes algorithm R-FCN [63]. The other is not using the region proposal method to detection, e.g., YOLO [64] and SSD [65].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…These algorithms are mainly classified into two groups: one is object detection method based on region proposal [58][59][60], which is a mainstream algorithm, e.g., RCNN [31], SPPNet [61], Fast-RCNN [34], Faster-RCNN [62], and MSRA recently proposes algorithm R-FCN [63]. The other is not using the region proposal method to detection, e.g., YOLO [64] and SSD [65].…”
Section: Related Workmentioning
confidence: 99%
“…Due to the fact that there is a large amount of overlap between these RoIs, redundant calculations result in inefficiencies. SSP-Net [61] and Fast-RCNN [34] propose a shared feature method that is extracted only one time for the whole image for this problem. And then, about 2000 RoIs are mapped according to their location information to the feature vector of the whole image to obtain the features of each RoI, so it greatly improves the speed of calculation because the feature extraction calculations of different RoI can be shared.…”
Section: Related Workmentioning
confidence: 99%
“…However, because it performs a ConvNet for each object proposal, the time spent on computing region proposals and features (13s/image on a Graphics Processing Unit (GPU) or 53s/image on a CPU) cannot be ignored for an object detection system. Inspired by the Spatial pyramid pooling networks (SPPnets) [40], Girshick [34] proposed Fast R-CNN to speed up R-CNN by sharing computation. The network processed all the images with a CNN to produce a conv feature map.…”
Section: Deep Learning In Computer Visionmentioning
confidence: 99%
“…High spatial resolution (HSR) remote sensing imaging sensors can now acquire aerial and satellite images with abundant detail and complex spatial structural information, which can be used in a wide range of civil and engineering applications, such as segmentation [4], scene annotation [5], object detection [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22] (e.g., airplane detection [6,12], urban area detection [13], vehicle detection [21,22]), scene classification and recognition [23][24][25][26][27], etc. Differing from natural imagery obtained by the camera on the ground from a horizontal view, HSR remote sensing imagery is obtained by satellite-borne or space-borne sensors from a top-down view, which is an approach that can be easily influenced by weather and illumination conditions.…”
Section: Introductionmentioning
confidence: 99%