Towards Bounding-Box Free Panoptic Segmentation

Bonde, Ujwal; Alcantarilla, Pablo F.; Leutenegger, Stefan

doi:10.48550/arxiv.2002.07705

Cited by 4 publications

(4 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…SSAP [27] exploits the pixel-pair affinity pyramid [58] enabled by an efficient graph partition method [42]. BBFNet [7] obtains instance segmentation results by Watershed transform [79,4] and Hough-voting [5,47]. Recently, Panoptic-DeepLab [18], a simple, fast, and strong approach for bottom-up panoptic segmentation, employs a class-agnostic instance segmentation branch involving a simple instance center regression [41,77,61], coupled with DeepLab semantic segmentation outputs [12,14,15].…”

Section: Related Workmentioning

confidence: 99%

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

Wang

Zhu

Green

et al. 2020

Lecture Notes in Computer Science

533

372

View full text Add to dashboard Cite

Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions. Recent works prove it possible to stack self-attention layers to obtain a fully attentional network by restricting the attention to a local region. In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D selfattentions. This reduces computation complexity and allows performing attention within a larger or even global region. In companion, we also propose a position-sensitive self-attention design. Combining both yields our position-sensitive axial-attention layer, a novel building block that one could stack to form axial-attention models for image classification and dense prediction. We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This previous state-of-the-art is attained by our small variant that is 3.8x parameter-efficient and 27x computation-efficient. Axial-DeepLab also achieves state-of-the-art results on Mapillary Vistas and Cityscapes.

show abstract

Section: Related Workmentioning

confidence: 99%

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

Wang

Zhu

Green

et al. 2020

Lecture Notes in Computer Science

533

372

View full text Add to dashboard Cite

show abstract

“…Then, the instance segments ('thing') and semantic segments ('stuff') [12] are fused by merging modules [52-54, 63, 70, 90, 92] to generate panoptic segmentation. Other proxy-based methods typically start with semantic segments [11,13,16] and group 'thing' pixels into instance segments with various proxy tasks, such as instance center regression [19,42,56,67,80,84,91], Watershed transform [4,8,82], Hough-voting [5,8,51], or pixel affinity [8,29,43,61,77]. DetectoRS [71] achieved the state-ofthe-art in this category with recursive feature pyramid and switchable atrous convolution.…”

Section: Related Workmentioning

confidence: 99%

CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

Yu¹,

Wang²,

Kim³

et al. 2022

Preprint

View full text Add to dashboard Cite

We propose Clustering Mask Transformer (CMT-DeepLab), a transformer-based framework for panoptic segmentation designed around clustering. It rethinks the existing transformer architectures used in segmentation and detection; CMT-DeepLab considers the object queries as cluster centers, which fill the role of grouping the pixels when applied to segmentation. The clustering is computed with an alternating procedure, by first assigning pixels to the clusters by their feature affinity, and then updating the cluster centers and pixel features. Together, these operations comprise the Clustering Mask Transformer (CMT) layer, which produces cross-attention that is denser and more consistent with the final segmentation task. CMT-DeepLab improves the performance over prior art significantly by 4.4% PQ, achieving a new state-of-the-art of 55.7% PQ on the COCO test-dev set.

show abstract

“…Contrary to box-based approaches, box-free methods typically start with semantic segments [12,14,16]. Then, instance segments are obtained by grouping 'thing' pixels with various methods, such as instance center regression [44,86,70,100,20], Watershed transform [88,3,8], Hough-voting [4,53,8], or pixel affinity [45,66,81,30,8]. Recently, Axial-DeepLab [89] advanced the state-of-the-art by equipping Panoptic-DeepLab [21] with a fully axial-attention [35] backbone.…”

Section: Related Workmentioning

confidence: 99%

MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

Wang¹,

Zhu²,

Adam³

et al. 2020

Preprint

View full text Add to dashboard Cite

We present MaX-DeepLab, the first end-to-end model for panoptic segmentation. Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks and hand-designed components, such as box detection, nonmaximum suppression, thing-stuff merging, etc. Although these sub-tasks are tackled by area experts, they fail to comprehensively solve the target task. By contrast, our MaX-DeepLab directly predicts class-labeled masks with a mask transformer, and is trained with a panoptic quality inspired loss via bipartite matching. Our mask transformer employs a dual-path architecture that introduces a global memory path in addition to a CNN path, allowing direct communication with any CNN layers. As a result, MaX-DeepLab shows a significant 7.1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first time. A small variant of MaX-DeepLab improves 3.0% PQ over DETR with similar parameters and M-Adds. Furthermore, MaX-DeepLab, without test time augmentation, achieves new state-of-the-art 51.3% PQ on COCO test-dev set.

show abstract

Towards Bounding-Box Free Panoptic Segmentation

Cited by 4 publications

References 36 publications

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

Contact Info

Product

Resources

About