2021
DOI: 10.48550/arxiv.2105.05838
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Breaking Shortcut: Exploring Fully Convolutional Cycle-Consistency for Video Correspondence Learning

Abstract: Previous cycle-consistency correspondence learning methods usually leverage image patches for training. In this paper, we present a fully convolutional method, which is simpler and more coherent to the inference process. While directly applying fully convolutional training results in model collapse, we study the underline reason behind this collapse phenomenon, indicating that the absolute positions of pixels provide a shortcut to easily accomplish cycleconsistence, which hinders the learning of meaningful vis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 34 publications
0
3
0
Order By: Relevance
“…Jabri et al [25] posed cycle-consistency as a random walk problem, allowing the model to obtain dense supervision from multiframe video. Tang et al [65] proposed an extension that allowed for fully convolutional training. These approaches are trained on sparse patches and learn coarse-grained correspondences.…”
Section: Related Workmentioning
confidence: 99%
“…Jabri et al [25] posed cycle-consistency as a random walk problem, allowing the model to obtain dense supervision from multiframe video. Tang et al [65] proposed an extension that allowed for fully convolutional training. These approaches are trained on sparse patches and learn coarse-grained correspondences.…”
Section: Related Workmentioning
confidence: 99%
“…The key idea is that after tracking a target from a given startpoint to an estimated endpopint, reversing the video and re-applying the tracker from the endpoint should lead back to the original startpoint. This core idea is typically combined with patchlevel affinity matrices, which can be traversed with spatial transformers [11], region-level motion averages [10], or random walks [12,23,24]. Unlike existing self-supervision works which initialize weights from ImageNet [25] or randomly, our method begins with state-of-the-art models whose architectures and weights are pre-optimized for motion estimation [13,14] and attempts to further refine them on a new test domain.…”
Section: Related Workmentioning
confidence: 99%
“…The Space-Time Memory Network [35] memorizes intermediate frames with segmentation masks as references and performs pixellevel matching between them with the current frame to segment target objects in a bottom-up manner, which has been proved effective and has served as the current mainstream framework. Some works [40,23,5,15,59,41,51,6,62,46,25,27] further develop STM and have achieved excellent performance.…”
Section: Introductionmentioning
confidence: 99%