Text Augmented Spatial Aware Zero-shot Referring Image Segmentation
Yucheng Suo,
Linchao Zhu,
Yi Yang
Abstract:In this paper, we study a challenging task of zero-shot referring image segmentation. This task aims to identify the instance mask that is most related to a referring expression without training on pixel-level annotations. Previous research takes advantage of pre-trained crossmodal models, e.g., CLIP, to align instancelevel masks with referring expressions. Yet, CLIP only considers the global-level alignment of image-text pairs, neglecting fine-grained matching between the referring sentence and local image re… Show more
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.