2022
DOI: 10.48550/arxiv.2207.01203
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

R^2VOS: Robust Referring Video Object Segmentation via Relational Multimodal Cycle Consistency

Abstract: Referring video object segmentation (R-VOS) aims to segment the object masks in a video given a referring linguistic expression to the object. It is a recently introduced task attracting growing research attention. However, all existing works make a strong assumption: The object depicted by the expression must exist in the video, namely, the expression and video must have an object-level semantic consensus. This is often violated in real-world applications where an expression can be queried to false videos, an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 37 publications
(67 reference statements)
0
1
0
Order By: Relevance
“…Video object segmentation (VOS) can be categorized as unsupervised (Wang et al 2019;Ren et al 2021a), semisupervised (Wang et al 2021) and referring (Wu et al 2022;Li et al 2022c) VOS. The most relevant type to this work is the unsupervised VOS (UVOS) which aims to segment primary object regions from the background in videos.…”
Section: Video Object Segmentationmentioning
confidence: 99%
“…Video object segmentation (VOS) can be categorized as unsupervised (Wang et al 2019;Ren et al 2021a), semisupervised (Wang et al 2021) and referring (Wu et al 2022;Li et al 2022c) VOS. The most relevant type to this work is the unsupervised VOS (UVOS) which aims to segment primary object regions from the background in videos.…”
Section: Video Object Segmentationmentioning
confidence: 99%