Scene flow tracks the 3D motion of each point in adjacent point clouds. It provides fundamental 3D motion perception for autonomous driving and server robot. Although red green blue depth (RGBD) camera or light detection and ranging (LiDAR) capture discrete 3D points in space, the objects and motions usually are continuous in the macroworld. That is, the objects keep themselves consistent as they flow from the current frame to the next frame. Based on this insight, the generative adversarial networks (GAN) is utilized to self‐learn 3D scene flow without ground truth. The fake point cloud is synthesized from the predicted scene flow and the point cloud of the first frame. The adversarial training of the generator and discriminator is realized through synthesizing indistinguishable fake point cloud and discriminating the real point cloud and the synthesized fake point cloud. The experiments on Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset show that our method realizes promising results. Just as human, the proposed method can identify the similar local structures of two adjacent frames even without knowing the ground truth scene flow. Then, the local correspondence can be correctly estimated, and further the scene flow is correctly estimated. An interactive preprint version of the article can be found here: https://www.authorea.com/doi/full/10.22541/au.163335790.03073492.
Recently, transformer architecture has gained great success in the computer vision community, such as image classification, object detection, etc. Nonetheless, its application for 3D vision remains to be explored, given that point cloud is inherently sparse, irregular, and unordered. Furthermore, existing point transformer frameworks usually feed raw point cloud of N×3 dimension into transformers, which limits the point processing scale because of their quadratic computational costs to the input size N. In this paper, we rethink the structure of point transformer. Instead of directly applying transformer to points, our network (TransLO) can process tens of thousands of points simultaneously by projecting points onto a 2D surface and then feeding them into a local transformer with linear complexity. Specifically, it is mainly composed of two components: Window-based Masked transformer with Self Attention (WMSA) to capture long-range dependencies; Masked Cross-Frame Attention (MCFA) to associate two frames and predict pose estimation. To deal with the sparsity issue of point cloud, we propose a binary mask to remove invalid and dynamic points. To our knowledge, this is the first transformer-based LiDAR odometry network. The experiment results on the KITTI odometry dataset show that our average rotation and translation RMSE achieves 0.500°/100m and 0.993% respectively. The performance of our network surpasses all recent learning-based methods and even outperforms LOAM on most evaluation sequences.Codes will be released on https://github.com/IRMVLab/TransLO.
3D scene flow characterizes how the points at the current time flow to the next time in the 3D Euclidean space, which possesses the capacity to infer autonomously the non-rigid motion of all objects in the scene. The previous methods for estimating scene flow from images have limitations, which split the holistic nature of 3D scene flow by estimating optical flow and disparity separately. Learning 3D scene flow from point clouds also faces the difficulties of the gap between synthesized and real data and the sparsity of LiDAR point clouds. In this paper, the generated dense depth map is utilized to obtain explicit 3D coordinates, which achieves direct learning of 3D scene flow from 2D images. The stability of the predicted scene flow is improved by introducing the dense nature of 2D pixels into the 3D space. Outliers in the generated 3D point cloud are removed by statistical methods to weaken the impact of noisy points on the 3D scene flow estimation task. Disparity consistency loss is proposed to achieve more effective unsupervised learning of 3D scene flow. The proposed method of self-supervised learning of 3D scene flow on real-world images is compared with a variety of methods for learning on the synthesized dataset and learning on LiDAR point clouds. The comparisons of multiple scene flow metrics are shown to demonstrate the effectiveness and superiority of introducing pseudo-LiDAR point cloud to scene flow estimation.
3D scene flow presents the 3D motion of each point in the 3D space, which forms the fundamental 3D motion perception for autonomous driving and server robots. Although the RGBD camera or LiDAR capture discrete 3D points in space, the objects and motions usually are continuous in the macro world. That is, the objects keep themselves consistent as they flow from the current frame to the next frame. Based on this insight, the Generative Adversarial Networks (GAN) is utilized to self-learn 3D scene flow with no need for ground truth. The fake point cloud of the second frame is synthesized from the predicted scene flow and the point cloud of the first frame. The adversarial training of the generator and discriminator is realized through synthesizing indistinguishable fake point cloud and discriminating the real point cloud and the synthesized fake point cloud. The experiments on KITTI scene flow dataset show that our method realizes promising results without ground truth. Just like a human observing a real-world scene, the proposed approach is capable of determining the consistency of the scene at different moments in spite of the exact flow value of each point is unknown in advance. Corresponding author(s) Email: wanghesheng@sjtu.edu.cn
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.