Joerg Stueckler scite author profile

The majority of approaches for acquiring dense 3D environment maps with RGB-D cameras assumes static environments or rejects moving objects as outliers. The representation and tracking of moving objects, however, has significant potential for applications in robotics or augmented reality. In this paper, we propose a novel approach to dynamic SLAM with dense object-level representations. We represent rigid objects in local volumetric signed distance function (SDF) maps, and formulate multi-object tracking as direct alignment of RGB-D images with the SDF representations. Our main novelty is a probabilistic formulation which naturally leads to strategies for data association and occlusion handling. We analyze our approach in experiments and demonstrate that our approach compares favorably with the state-of-the-art methods in terms of robustness and accuracy.

show abstract

Efficient Multi-resolution Plane Segmentation of 3D Point Clouds

Oehler

Stueckler

Welle

et al. 2011

View full text Add to dashboard Cite

Abstract. We present an efficient multi-resolution approach to segment a 3D point cloud into planar components. In order to gain efficiency, we process large point clouds iteratively from coarse to fine 3D resolutions: At each resolution, we rapidly extract surface normals to describe surface elements (surfels). We group surfels that cannot be associated with planes from coarser resolutions into coplanar clusters with the Hough transform. We then extract connected components on these clusters and determine a best plane fit through RANSAC. Finally, we merge plane segments and refine the segmentation on the finest resolution. In experiments, we demonstrate the efficiency and quality of our method and compare it to other state-of-the-art approaches.

show abstract

Learning to Identify Physical Parameters from Video Using Differentiable Physics

Kandukuri

Achterhold

Moeller

et al. 2021

View full text Add to dashboard Cite

Video representation learning has recently attracted attention in computer vision due to its applications for activity and scene forecasting or vision-based planning and control. Video prediction models often learn a latent representation of video which is encoded from input frames and decoded back into images. Even when conditioned on actions, purely deep learning based architectures typically lack a physically interpretable latent space. In this study, we use a differentiable physics engine within an action-conditional video representation network to learn a physical latent representation. We propose supervised and self-supervised learning methods to train our network and identify physical properties. The latter uses spatial transformers to decode physical states back into images. The simulation scenarios in our experiments comprise pushing, sliding and colliding objects, for which we also analyze the observability of the physical properties. In experiments we demonstrate that our network can learn to encode images and identify physical properties like mass and friction from videos and action sequences in the simulated scenarios. We evaluate the accuracy of our supervised and self-supervised methods and compare it with a system identification baseline which directly learns from state trajectories. We also demonstrate the ability of our method to predict future video frames from input images and actions. Electronic supplementary material The online version of this chapter (10.1007/978-3-030-71278-5_4) contains supplementary material, which is available to authorized users.

show abstract

Visual-Inertial Odometry With Online Calibration of Velocity-Control Based Kinematic Motion Models

Stueckler

2022

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

DirectShape: Direct Photometric Alignment of Shape Priors for Visual Vehicle Pose and Shape Estimation

Wang¹,

Yang²,

Stueckler³

et al. 2019

Preprint

View full text Add to dashboard Cite

Scene understanding from images is a challenging problem encountered in autonomous driving. On the object level, while 2D methods have gradually evolved from computing simple bounding boxes to delivering finer grained results like instance segmentations, the 3D family is still dominated by estimating 3D bounding boxes. In this paper, we propose a novel approach to jointly infer the 3D rigid-body poses and shapes of vehicles from a stereo image pair using shape priors. Unlike previous works that geometrically align shapes to point clouds from dense stereo reconstruction, our approach works directly on images by combining a photometric and a silhouette alignment term in the energy function. An adaptive sparse point selection scheme is proposed to efficiently measure the consistency with both terms. In experiments, we show superior performance of our method on 3D pose and shape estimation over the previous geometric approach and demonstrate that our method can also be applied as a refinement step and significantly boost the performances of several state-of-the-art deep learning based 3D object detectors. All related materials and demonstration videos are available at the project page https://vision. in.tum.de/research/vslam/direct-shape.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Joerg Stueckler

EM-Fusion: Dynamic Object-Level SLAM With Probabilistic Data Association

Efficient Multi-resolution Plane Segmentation of 3D Point Clouds

Learning to Identify Physical Parameters from Video Using Differentiable Physics

Visual-Inertial Odometry With Online Calibration of Velocity-Control Based Kinematic Motion Models

DirectShape: Direct Photometric Alignment of Shape Priors for Visual Vehicle Pose and Shape Estimation

Contact Info

Product

Resources

About