2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.00268
|View full text |Cite
|
Sign up to set email alerts
|

Deformable Sprites for Unsupervised Video Decomposition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 40 publications
(6 citation statements)
references
References 26 publications
0
6
0
Order By: Relevance
“…Our encoder network is a ResNet-32-CIFAR10 [20], that is truncated after layer 3 with a Gaussian feature pooling described in supplementary material. For our unsupervised experiments, we use as generator g θ the U-Net architecture of Deformable Sprites [49] which converged quickly, and for our supervised experiments a 2-layer MLP similar to MarioNette [41] which produces sprites of higher quality. The networks π θ and p θ are a single linear layers followed by layer-normalization.…”
Section: Losses and Training Detailsmentioning
confidence: 99%
“…Our encoder network is a ResNet-32-CIFAR10 [20], that is truncated after layer 3 with a Gaussian feature pooling described in supplementary material. For our unsupervised experiments, we use as generator g θ the U-Net architecture of Deformable Sprites [49] which converged quickly, and for our supervised experiments a 2-layer MLP similar to MarioNette [41] which produces sprites of higher quality. The networks π θ and p θ are a single linear layers followed by layer-normalization.…”
Section: Losses and Training Detailsmentioning
confidence: 99%
“…The predominant way to identify the objects present in a scene is to segment twodimensional images using extensive manual annotation (Kirillov et al, 2023;Wang et al, 2023a), but relying on human supervision introduces challenges and scales poorly to 3D data. As an alternative, an extensive line of work on unsupervised object discovery (Russell et al, 2006;Rubinstein et al, 2013;Oktay et al, 2018;Hénaff et al, 2022;Smith et al, 2022;Ye et al, 2022;Monnier et al, 2023) proposes different inductive biases (Locatello et al, 2019) that encourage awareness of objects in a scene. However, these approaches are largely restricted to either 2D images or constrained 3D data (Yu et al, 2021;Sajjadi et al, 2022), limiting their applicability to complex 3D scenes.…”
Section: Related Workmentioning
confidence: 99%
“…In general, many popular algorithms [4], [6]- [17] require supervised training on large-scale datasets to obtain the segmentation masks. Alternatively, a number of works [25]- [27] based on the offline setting employ a deep neural network to discover the objects of interest from the perspective of completely unsupervised concepts in the traditional methods. Lu et al [67] proposed a unified framework for unsupervised learning, aimed at object segmentation through the exploitation of the inherent consistency across adjacent frames in unlabeled videos.…”
Section: Related Workmentioning
confidence: 99%
“…DyStaB employs static and dynamic models to learn object saliency from motion in a video, which can then be applied at inference time to segment objects, even in static images [25]. Deformable Sprites (DeSprites) [27] are a type of video autoencoder model that is optimized on each individual video. Our work also optimizes an auto-encoder on a specific sequence in an unsupervised manner.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation