Perceiving and handling deformable objects is an integral part of everyday life for humans. Automating tasks such as food handling, garment sorting, or assistive dressing requires open problems of modeling, perceiving, planning, and control to be solved. Recent advances in data-driven approaches, together with classical control and planning, can provide viable solutions to these open challenges. In addition, with the development of better simulation environments, we can generate and study scenarios that allow for benchmarking of various approaches and gain better understanding of what theoretical developments need to be made and how practical systems can be implemented and evaluated to provide flexible, scalable, and robust solutions. To this end, we survey more than 100 relevant studies in this area and use it as the basis to discuss open problems. We adopt a learning perspective to unify the discussion over analytical and data-driven approaches, addressing how to use and integrate model priors and task data in perceiving and manipulating a variety of deformable objects.
We present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces, focusing on manipulation of deformable objects. We propose a Latent Space Roadmap (LSR) for task planning, a graph-based structure capturing globally the system dynamics in a low-dimensional latent space. Our framework consists of three parts: (1) a Mapping Module (MM) that maps observations, given in the form of images, into a structured latent space extracting the respective states, that generates observations from the latent states, (2) the LSR which builds and connects clusters containing similar states in order to find the latent plans between start and goal states extracted by MM, and (3) the Action Proposal Module that complements the latent plan found by the LSR with the corresponding actions. We present a thorough investigation of our framework on two simulated box stacking tasks and a folding task executed on a real robot.
Learning state representations enables robotic planning directly from raw observations such as images. Most methods learn state representations by utilizing losses based on the reconstruction of the raw observations from a lowerdimensional latent space. The similarity between observations in the space of images is often assumed and used as a proxy for estimating similarity between the underlying states of the system. However, observations commonly contain task-irrelevant factors of variation which are nonetheless important for reconstruction, such as varying lighting and different camera viewpoints. In this work, we define relevant evaluation metrics and perform a thorough study of different loss functions for state representation learning. We show that models exploiting task priors, such as Siamese networks with a simple contrastive loss, outperform reconstruction-based representations in visual task planning.
Advanced representation learning techniques require reliable and general evaluation methods. Recently, several algorithms based on the common idea of geometric and topological analysis of a manifold approximated from the learned data representations have been proposed. In this work, we introduce Delaunay Component Analysis (DCA) -an evaluation algorithm which approximates the data manifold using a more suitable neighbourhood graph called Delaunay graph. This provides a reliable manifold estimation even for challenging geometric arrangements of representations such as clusters with varying shape and density as well as outliers, which is where existing methods often fail. Furthermore, we exploit the nature of Delaunay graphs and introduce a framework for assessing the quality of individual novel data representations. We experimentally validate the proposed DCA method on representations obtained from neural networks trained with contrastive objective, supervised and generative models, and demonstrate various use cases of our extended single point evaluation framework.Recently, more general evaluation methods such as Geometry Score (GS) (Khrulkov & Oseledets, 2018), Improved Precision and Recall (IPR) (Kynkäänniemi et al., 2019) and Geometric Component Analysis (GeomCA) (Poklukar et al., 2021) have been proposed.
Variational auto-encoders (VAEs) are deep generative models used for unsupervised learning, however their standard version is not topology-aware in practice since the data topology may not be taken into consideration. In this paper, we propose two different approaches with the aim to preserve the topological structure between the input space and the latent representation of a VAE. Firstly, we introduce InvMap-VAE as a way to turn any dimensionality reduction technique, given an embedding it produces, into a generative model within a VAE framework providing an inverse mapping into original space. Secondly, we propose the Witness Simplicial VAE as an extension of the simplicial auto-encoder to the variational setup using a witness complex for computing the simplicial regularization, and we motivate this method theoretically using tools from algebraic topology. The Witness Simplicial VAE is independent of any dimensionality reduction technique and together with its extension, Isolandmarks Witness Simplicial VAE, preserves the persistent Betti numbers of a dataset better than a standard VAE.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.