Autonomous vehicles operate in highly dynamic environments necessitating an accurate assessment of which aspects of a scene are moving and where they are moving to. A popular approach to 3D motion estimation, termed scene flow, is to employ 3D point cloud data from consecutive LiDAR scans, although such approaches have been limited by the small size of real-world, annotated LiDAR data. In this work, we introduce a new large-scale dataset for scene flow estimation derived from corresponding tracked 3D objects, which is ∼1,000× larger than previous real-world datasets in terms of the number of annotated frames. We demonstrate how previous works were bounded by the amount of real LiDAR data available, suggesting that larger datasets are required to achieve state-of-the-art predictive performance. Furthermore, we show how previous heuristics such as down-sampling heavily degrade performance, motivating a new class of models that are tractable on the full point cloud.To address this issue, we introduce the FastFlow3D architecture which provides real time inference on the full point cloud. Additionally, we design human-interpretable metrics that better capture real world aspects by accounting for ego-motion and providing breakdowns per object type. We hope that this dataset may provide new opportunities for developing real world scene flow systems.
To operate intelligently in domestic environments, robots require the ability to understand arbitrary spatial relations between objects and to generalize them to objects of varying sizes and shapes. In this work, we present a novel end-to-end approach to generalize spatial relations based on distance metric learning. We train a neural network to transform 3D point clouds of objects to a metric space that captures the similarity of the depicted spatial relations, using only geometric models of the objects. Our approach employs gradient-based optimization to compute object poses in order to imitate an arbitrary target relation by reducing the distance to it under the learned metric. Our results based on simulated and real-world experiments show that the proposed method enables robots to generalize spatial relations to unknown objects over a continuous spectrum. input: 2 × 100 × 100 10 × 10 conv, 32, elu, pool 8 × 8 conv, 42, elu, pool 6 × 6 conv, 64, elu, pool 4 × 4 conv, 64, elu, pool 4 × 4 conv, 128, elu, pool 4 × 4 conv, 128, elu 2 × 2 conv, 128, elu
Estimating scene flow in real world LiDAR point clouds from an autonomous vehicle. Left: Overlay of two consecutive frames of point clouds (green and blue, respectively) sampled at 10 Hz from the Waymo Open Dataset [48]. White boxes indicate tracked 3D bounding boxes for human annotated vehicles and pedestrians. Right: Predicted scene flow for each point colored by direction, and brightened by the magnitude of motion based on overlaid frames † . Note that pedestrians and vehicles have distinctly different colors based on the direction of movement.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.