Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413990
|View full text |Cite
|
Sign up to set email alerts
|

Cap2Seg: Inferring Semantic and Spatial Context from Captions for Zero-Shot Image Segmentation

Abstract: Zero-shot image segmentation refers to the task of segmenting pixels from specific unseen semantic class. Previous methods mainly rely on historic segmentation tasks, such as using semantic embedding or word embedding of class names to infer a new segmentation model. In this work we describe Cap2Seg, a novel solution of zero-shot image segmentation that harnesses accompanying image captions for intelligently inferring spatial and semantic context for the zero-shot image segmentation task. As our main insight, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(6 citation statements)
references
References 38 publications
0
6
0
Order By: Relevance
“…Zero-shot image segmentation [2,11,13,17,28,33,35,41] aims at segmenting an image containing classes that are not seen during the training phase. The alignment between visual embeddings and text embeddings of categories is of great importance in this task.…”
Section: Related Work 21 Zero-shot Image Segmentationmentioning
confidence: 99%
“…Zero-shot image segmentation [2,11,13,17,28,33,35,41] aims at segmenting an image containing classes that are not seen during the training phase. The alignment between visual embeddings and text embeddings of categories is of great importance in this task.…”
Section: Related Work 21 Zero-shot Image Segmentationmentioning
confidence: 99%
“…[5] used a state of the art segmentation network (DeepLab), and propagate information of unseen classes to pixel embeddings using Word2Vec, together with self-training. [41] leveraged image captions in order to segment unknown classes. [26] learns a generator to produce visual features from semantic word embeddings, similar to [5], but it alternated between generating "good features", while maintaining the structural-relations between categories in the text latent space, as before.…”
Section: Related Workmentioning
confidence: 99%
“…Recent works have explored zero-shot object detection by learning to distinguish between background and novel object regions [24,25], synthesizing unseen class features [26] or using richer textual descriptions [56]. For pixel-level mask prediction, [57][58][59][60][61][62][63][64] perform zero-shot semantic segmentation while [27] tackles the challenging zero-shot instance segmentation task. Since these zero-shot methods only have access to base class annotations, they perform poorly on novel classes.…”
Section: Related Workmentioning
confidence: 99%