2021
DOI: 10.48550/arxiv.2109.06737
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Comparing Reconstruction- and Contrastive-based Models for Visual Task Planning

Abstract: Learning state representations enables robotic planning directly from raw observations such as images. Most methods learn state representations by utilizing losses based on the reconstruction of the raw observations from a lowerdimensional latent space. The similarity between observations in the space of images is often assumed and used as a proxy for estimating similarity between the underlying states of the system. However, observations commonly contain task-irrelevant factors of variation which are nonethel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 23 publications
1
3
0
Order By: Relevance
“…For Q 2 , the conservative approach resulted in B = 85% due to 77 points being assigned to a component despite their wrong label c i , i ≥ 7 (thus leading to A = 0%), suggesting a moderate separation of classes. For Q 3 , we see that A = 8% and B = 28% in the conservative approach which indicates that the model was not able to recognize similar box configurations recorded from different camera view, which is aligned with the results obtained by Chamzas et al (2021). Lastly, we observe that flexible approach yielded only minor decrease in B for Q 2 and Q 3 .…”
Section: Hyperparameterssupporting
confidence: 80%
See 2 more Smart Citations
“…For Q 2 , the conservative approach resulted in B = 85% due to 77 points being assigned to a component despite their wrong label c i , i ≥ 7 (thus leading to A = 0%), suggesting a moderate separation of classes. For Q 3 , we see that A = 8% and B = 28% in the conservative approach which indicates that the model was not able to recognize similar box configurations recorded from different camera view, which is aligned with the results obtained by Chamzas et al (2021). Lastly, we observe that flexible approach yielded only minor decrease in B for Q 2 and Q 3 .…”
Section: Hyperparameterssupporting
confidence: 80%
“…Typically, a classification problem is used to evaluate either the ability of a model to recover labels of raw inputs, or the transferability of their representations to other domains, as done in state-of-the-art unsupervised representation learning methods (Chen et al, 2020b;Ermolov et al, 2021). However, in many scenarios such straightforward downstream classification task cannot be defined, for instance, because it does not represent the nature of the application or due to the scarcity of labeled data as often occurs in robotics (Chamzas et al, 2021;Lippi et al, 2020). In these scenarios, representations are commonly evaluated on hand-crafted downstream tasks, e.g., specific robotics tasks.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…We perform a grid search for w ε in the interval [−0.65, −0.05] with step size 0.1. The LPM is a two-layer, 100 nodes MLP, while the Siamese network for the SM is a shallow two convolutional layer network with a latent space dimension 12 as in [17]. We train the Siamese network for 100 epochs and perform HDBSCAN [18] clustering in the latent space Z of the model.…”
Section: A Evaluation Criteria and Implementation Detailsmentioning
confidence: 99%