Abstract-Image based reconstruction of urban environments is a challenging problem that deals with optimization of large number of variables, and has several sources of errors like the presence of dynamic objects. Since most large scale approaches make the assumption of observing static scenes, dynamic objects are relegated to the noise modeling section of such systems. This is an approach of convenience since the RANSAC based framework used to compute most multiview geometric quantities for static scenes naturally confine dynamic objects to the class of outlier measurements. However, reconstructing dynamic objects along with the static environment helps us get a complete picture of an urban environment. Such understanding can then be used for important robotic tasks like path planning for autonomous navigation, obstacle tracking and avoidance, and other areas.In this paper, we propose a system for robust SLAM that works in both static and dynamic environments. To overcome the challenge of dynamic objects in the scene, we propose a new model to incorporate semantic constraints into the reconstruction algorithm. While some of these constraints are based on multi-layered dense CRFs trained over appearance as well as motion cues, other proposed constraints can be expressed as additional terms in the bundle adjustment optimization process that does iterative refinement of 3D structure and camera / object motion trajectories. We show results on the challenging KITTI urban dataset for accuracy of motion segmentation and reconstruction of the trajectory and shape of moving objects relative to ground truth. We are able to show average relative error reduction by a significant amount for moving object trajectory reconstruction relative to state-ofthe-art methods like VISO 2[16], as well as standard bundle adjustment algorithms.
While the literature has been fairly dense in the areas of scene
understanding and semantic labeling there have been few works that make use of
motion cues to embellish semantic performance and vice versa. In this paper, we
address the problem of semantic motion segmentation, and show how semantic and
motion priors augments performance. We pro- pose an algorithm that jointly
infers the semantic class and motion labels of an object. Integrating semantic,
geometric and optical ow based constraints into a dense CRF-model we infer both
the object class as well as motion class, for each pixel. We found improvement
in performance using a fully connected CRF as compared to a standard
clique-based CRFs. For inference, we use a Mean Field approximation based
algorithm. Our method outperforms recently pro- posed motion detection
algorithms and also improves the semantic labeling compared to the
state-of-the-art Automatic Labeling Environment algorithm on the challenging
KITTI dataset especially for object classes such as pedestrians and cars that
are critical to an outdoor robotic navigation scenario
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.