2017
DOI: 10.1109/tpami.2016.2578328
|View full text |Cite
|
Sign up to set email alerts
|

Object Instance Segmentation and Fine-Grained Localization Using Hypercolumns

Abstract: Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as a feature representation. However, the information in this layer may be too coarse spatially to allow precise localization. On the contrary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel. Using hypercolumns as pixel descriptors, we show result… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
31
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 87 publications
(31 citation statements)
references
References 50 publications
0
31
0
Order By: Relevance
“…Faster RCNN [229,230]: Although Fast RCNN significantly sped up the detection process, it still relies on external region proposals, whose computation is exposed as the new speed bottleneck in Fast RCNN. Recent work has shown that CNNs have a remarkable ability to localize objects in CONV layers [317,318,46,200,97], an ability which is weakened in the FC layers. Therefore, the selective search can be replaced by a CNN in producing region proposals.…”
Section: Region Based (Two Stage) Frameworkmentioning
confidence: 99%
See 1 more Smart Citation
“…Faster RCNN [229,230]: Although Fast RCNN significantly sped up the detection process, it still relies on external region proposals, whose computation is exposed as the new speed bottleneck in Fast RCNN. Recent work has shown that CNNs have a remarkable ability to localize objects in CONV layers [317,318,46,200,97], an ability which is weakened in the FC layers. Therefore, the selective search can be replaced by a CNN in producing region proposals.…”
Section: Region Based (Two Stage) Frameworkmentioning
confidence: 99%
“…(1) Detecting with combined features of multiple CNN layers: Many approaches, including Hypercolumns [97], HyperNet [135], and ION [11], combine features from multiple layers before making a prediction. Such feature combination is commonly accomplished via concatenation, a classic neural network idea that concatenates features from different layers, architectures which have recently become popular for semantic segmentation [177,241,97]. As shown in Fig.…”
Section: Handling Of Object Scale Variationsmentioning
confidence: 99%
“…Our approach takes advantage of the previous insights, and consists of a modularized network that exploits both the possibility of segmentation based on combinations of multi-domain information, and the feasibility of producing filters that respond to objects being referred to by processing the linguistic information. Following the spirit of [24][25] [26], we use skip connections between the downsampling process and the upsampling module to output finely-defined segmentations. We employ the concatenation strategy of [3] but include richer visual and language features.…”
Section: Recurrent Multimodal Interactionmentioning
confidence: 99%
“…Convolutional neural networks [8,9], originally proposed by LeCun et al for handwritten digit recognition, have been recently succeeded in image identification, detection, and segmentation tasks [10][11][12][13][14][15]. CNN is proved to have a strong ability in large scale image classification.…”
Section: Convolutional Neural Networkmentioning
confidence: 99%