“…Alternatively, segmentation methods like Refs. [19][20][21] use tracked sparse features to perform motion consistency analysis and motion segmentation; dense approaches taking RGBD input [5][6][7][8]22] combine the registration residual of dense model alignment and geometric features for enhanced segmentation and tracking. Further techniques for dynamic SLAM are summarized in Ref.…”
Section: Visual Slam In Dynamic Environmentsmentioning
We present a practical backend for stereo visual SLAM which can simultaneously discover individual rigid bodies and compute their motions in dynamic environments. While recent factor graph based state optimization algorithms have shown their ability to robustly solve SLAM problems by treating dynamic objects as outliers, their dynamic motions are rarely considered. In this paper, we exploit the consensus of 3D motions for landmarks extracted from the same rigid body for clustering, and to identify static and dynamic objects in a unified manner. Specifically, our algorithm builds a noise-aware motion affinity matrix from landmarks, and uses agglomerative clustering to distinguish rigid bodies. Using decoupled factor graph optimization to revise their shapes and trajectories, we obtain an iterative scheme to update both cluster assignments and motion estimation reciprocally. Evaluations on both synthetic scenes and KITTI demonstrate the capability of our approach, and further experiments considering online efficiency also show the effectiveness of our method for simultaneously tracking ego-motion and multiple objects.
“…Alternatively, segmentation methods like Refs. [19][20][21] use tracked sparse features to perform motion consistency analysis and motion segmentation; dense approaches taking RGBD input [5][6][7][8]22] combine the registration residual of dense model alignment and geometric features for enhanced segmentation and tracking. Further techniques for dynamic SLAM are summarized in Ref.…”
Section: Visual Slam In Dynamic Environmentsmentioning
We present a practical backend for stereo visual SLAM which can simultaneously discover individual rigid bodies and compute their motions in dynamic environments. While recent factor graph based state optimization algorithms have shown their ability to robustly solve SLAM problems by treating dynamic objects as outliers, their dynamic motions are rarely considered. In this paper, we exploit the consensus of 3D motions for landmarks extracted from the same rigid body for clustering, and to identify static and dynamic objects in a unified manner. Specifically, our algorithm builds a noise-aware motion affinity matrix from landmarks, and uses agglomerative clustering to distinguish rigid bodies. Using decoupled factor graph optimization to revise their shapes and trajectories, we obtain an iterative scheme to update both cluster assignments and motion estimation reciprocally. Evaluations on both synthetic scenes and KITTI demonstrate the capability of our approach, and further experiments considering online efficiency also show the effectiveness of our method for simultaneously tracking ego-motion and multiple objects.
“…Existing work on 3D instance recovery from images. 3D objects are usually recovered from multiple frames, 3D range sensors [26], or learning-based methods [67,13]. Nevertheless, addressing 3D instance understanding from a single image in an uncontrolled environment is ill-posed and challenging, thus attracting growing attention.…”
Figure 1: An example of our dataset, where (a) is the input color image, (b) illustrates the labeled 2D keypoints, (c) shows the 3D model fitting result with labeled 2D keypoints.
“…While there is a large amount of research dedicated to real-world perception in this domain [15,25,26,12,35,29,13,9], there is a surprising lack of work on entity state prediction in the same domain (see Section 2), which we attribute to two main causes:…”
We focus on the problem of predicting future states of entities in complex, real-world driving scenarios. Previous research has used low-level signals to predict short time horizons, and has not addressed how to leverage key assets relied upon heavily by industry self-driving systems:(1) large 3D perception efforts which provide highly accurate 3D states of agents with rich attributes, and (2) detailed and accurate semantic maps of the environment (lanes, traffic lights, crosswalks, etc). We present a unified representation which encodes such high-level semantic information in a spatial grid, allowing the use of deep convolutional models to fuse complex scene context. This enables learning entity-entity and entity-environment interactions with simple, feed-forward computations in each timestep within an overall temporal model of an agent's behavior. We propose different ways of modelling the future as a distribution over future states using standard supervised learning. We introduce a novel dataset providing industry-grade rich perception and semantic inputs, and empirically show we can effectively learn fundamentals of driving behavior.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.