2014
DOI: 10.1007/978-3-319-10578-9_23
|View full text |Cite
|
Sign up to set email alerts
|

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Abstract: Abstract-Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224×224) input image. This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, "spatial pyramid pooling", to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also rob… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
1,154
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
2
2

Relationship

0
10

Authors

Journals

citations
Cited by 1,941 publications
(1,245 citation statements)
references
References 41 publications
4
1,154
0
Order By: Relevance
“…2D-CNN has been demonstrated with great promise in the field of computer vision and image processing, with applications such as image classification [38][39][40], object detection [41,42], and depth estimation from a single image [43]. The most significant advantage of 2D-CNN is that it offers a principled way to extract features directly from the raw input imagery.…”
Section: D Convolution Operationmentioning
confidence: 99%
“…2D-CNN has been demonstrated with great promise in the field of computer vision and image processing, with applications such as image classification [38][39][40], object detection [41,42], and depth estimation from a single image [43]. The most significant advantage of 2D-CNN is that it offers a principled way to extract features directly from the raw input imagery.…”
Section: D Convolution Operationmentioning
confidence: 99%
“…Therefore, it was soon enhanced to learn from volumetric data represented by occupancy grids [31], [46]. Another issue tackled by the CNN community was developing solutions for obtaining invariance to different kinds of transformations [28] and scale changes [18]. In [16], an adaptive, multi-scale CNN architecture was proposed to jointly perform depth prediction, surface normal estimation and semantic labeling.…”
Section: Related Workmentioning
confidence: 99%
“…International Conference on Mechanical, Electronic, Control and Automation Engineering (MECAE 2017) SPPnet [4], Fast R-CNN [5], Faster R-CNN [6], etc. Among them, Faster R-CNN has the best performance, both in detection time and accuracy.…”
Section: Advances In Engineering Research (Aer) Volume 61mentioning
confidence: 99%