Superpixel Convolutional Networks Using Bilateral Inceptions

Gadde, Raghudeep; Jampani, Varun; Kiefel, Martin; Kappler, Daniel; Gehler, Peter V.

doi:10.1007/978-3-319-46448-0_36

Cited by 115 publications

(97 citation statements)

References 36 publications

(71 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ing layers. This prompted several works [51,58,15,35,21] to propose specialized CNN modules that help restore the spatial resolution of the network output.…”

Section: Introductionmentioning

confidence: 99%

Gated-SCNN: Gated Shape CNNs for Semantic Segmentation

Takikawa

Acuna

Jampani

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

652

391

View full text Add to dashboard Cite

Current state-of-the-art methods for image segmentation form a dense image representation where the color, shape and texture information are all processed together inside a deep CNN. This however may not be ideal as they contain very different type of information relevant for recognition. Here, we propose a new two-stream CNN architecture for semantic segmentation that explicitly wires shape information as a separate processing branch, i.e. shape stream, that processes information in parallel to the classical stream. Key to this architecture is a new type of gates that connect the intermediate layers of the two streams. Specifically, we use the higher-level activations in the classical stream to gate the lower-level activations in the shape stream, effectively removing noise and helping the shape stream to only focus on processing the relevant boundary-related information. This enables us to use a very shallow architecture for the shape stream that operates on the image-level resolution. Our experiments show that this leads to a highly effective architecture that produces sharper predictions around object boundaries and significantly boosts performance on thinner and smaller objects. Our method achieves state-ofthe-art performance on the Cityscapes benchmark, in terms of both mask (mIoU) and boundary (F-score) quality, improving by 2% and 4% over strong baselines.

show abstract

“…ing layers. This prompted several works [51,58,15,35,21] to propose specialized CNN modules that help restore the spatial resolution of the network output.…”

Section: Introductionmentioning

confidence: 99%

Gated-SCNN: Gated Shape CNNs for Semantic Segmentation

Takikawa

Acuna

Jampani

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

652

391

View full text Add to dashboard Cite

show abstract

“…Spectral graph theory [12], and in particular the Normalized Cut [62] criterion provides a way to further integrate global image information for better segmentation. More recently, superpixel approaches [1] emerge to be a popular pre-processing step that helps reduce the computation, or can be used to refine the semantic segmentation predictions [20]. However, the challenge of perceptual organization is to process information from different levels together to form consensus segmentation.…”

Section: Related Workmentioning

confidence: 99%

SegSort: Segmentation by Discriminative Sorting of Segments

Hwang

Shi

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

114

View full text Add to dashboard Cite

Almost all existing deep learning approaches for semantic segmentation tackle this task as a pixel-wise classification problem. Yet humans understand a scene not in terms of pixels, but by decomposing it into perceptual groups and structures that are the basic building blocks of recognition. This motivates us to propose an end-to-end pixelwise metric learning approach that mimics this process. In our approach, the optimal visual representation determines the right segmentation within individual images and associates segments with the same semantic classes across images. The core visual learning problem is therefore to maximize the similarity within segments and minimize the similarity between segments. Given a model trained this way, inference is performed consistently by extracting pixel-wise embeddings and clustering, with the semantic label determined by the majority vote of its nearest neighbors from an annotated set. As a result, we present the SegSort, as a first attempt using deep learning for unsupervised semantic segmentation, achieving 76% performance of its supervised counterpart. When supervision is available, Seg-Sort shows consistent improvements over conventional approaches based on pixel-wise softmax training. Additionally, our approach produces more precise boundaries and consistent region predictions. The proposed SegSort further produces an interpretable result, as each choice of label can be easily understood from the retrieved nearest segments.

show abstract

“…The proposed BCLs are beneficial in neural networks as they allow to redefine proximity of pixels w.r.t. different characteristics [12,20,30]. Moreover, BCLs can inherently cope with sparse data [24], e.g.…”

Section: Related Workmentioning

confidence: 99%

“…All of these filters have been included into deep networks, e.g. for semantic segmentation [12,14], image processing [45], or video classification [42].…”

Section: Related Workmentioning

confidence: 99%

Learning Task-Specific Generalized Convolutions in the Permutohedral Lattice

Wannenwetsch

Kiefel

Gehler

et al. 2019

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

0000−0002−7016−3820] , Martin Kiefel 2[0000−0001−9432−5428] , Peter V. Gehler 2[0000−0002−5812−4052] , Stefan Roth 1[0000−0001−9002−9832]Abstract. Dense prediction tasks typically employ encoder-decoder architectures, but the prevalent convolutions in the decoder are not imageadaptive and can lead to boundary artifacts. Different generalized convolution operations have been introduced to counteract this. We go beyond these by leveraging guidance data to redefine their inherent notion of proximity. Our proposed network layer builds on the permutohedral lattice, which performs sparse convolutions in a high-dimensional space allowing for powerful non-local operations despite small filters. Multiple features with different characteristics span this permutohedral space. In contrast to prior work, we learn these features in a task-specific manner by generalizing the basic permutohedral operations to learnt feature representations. As the resulting objective is complex, a carefully designed framework and learning procedure are introduced, yielding rich feature embeddings in practice. We demonstrate the general applicability of our approach in different joint upsampling tasks. When adding our network layer to state-of-the-art networks for optical flow and semantic segmentation, boundary artifacts are removed and the accuracy is improved.

show abstract

Superpixel Convolutional Networks Using Bilateral Inceptions

Cited by 115 publications

References 36 publications

Gated-SCNN: Gated Shape CNNs for Semantic Segmentation

Gated-SCNN: Gated Shape CNNs for Semantic Segmentation

SegSort: Segmentation by Discriminative Sorting of Segments

Learning Task-Specific Generalized Convolutions in the Permutohedral Lattice

Contact Info

Product

Resources

About