Jiang, Chaokang scite author profile

Scene flow tracks the 3D motion of each point in adjacent point clouds. It provides fundamental 3D motion perception for autonomous driving and server robot. Although red green blue depth (RGBD) camera or light detection and ranging (LiDAR) capture discrete 3D points in space, the objects and motions usually are continuous in the macroworld. That is, the objects keep themselves consistent as they flow from the current frame to the next frame. Based on this insight, the generative adversarial networks (GAN) is utilized to self‐learn 3D scene flow without ground truth. The fake point cloud is synthesized from the predicted scene flow and the point cloud of the first frame. The adversarial training of the generator and discriminator is realized through synthesizing indistinguishable fake point cloud and discriminating the real point cloud and the synthesized fake point cloud. The experiments on Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset show that our method realizes promising results. Just as human, the proposed method can identify the similar local structures of two adjacent frames even without knowing the ground truth scene flow. Then, the local correspondence can be correctly estimated, and further the scene flow is correctly estimated. An interactive preprint version of the article can be found here: https://www.authorea.com/doi/full/10.22541/au.163335790.03073492.

show abstract

TransLO: A Window-Based Masked Point Transformer Framework for Large-Scale LiDAR Odometry

Liu

Wang

Chaokang

et al. 2023

AAAI

View full text Add to dashboard Cite

Recently, transformer architecture has gained great success in the computer vision community, such as image classification, object detection, etc. Nonetheless, its application for 3D vision remains to be explored, given that point cloud is inherently sparse, irregular, and unordered. Furthermore, existing point transformer frameworks usually feed raw point cloud of N×3 dimension into transformers, which limits the point processing scale because of their quadratic computational costs to the input size N. In this paper, we rethink the structure of point transformer. Instead of directly applying transformer to points, our network (TransLO) can process tens of thousands of points simultaneously by projecting points onto a 2D surface and then feeding them into a local transformer with linear complexity. Specifically, it is mainly composed of two components: Window-based Masked transformer with Self Attention (WMSA) to capture long-range dependencies; Masked Cross-Frame Attention (MCFA) to associate two frames and predict pose estimation. To deal with the sparsity issue of point cloud, we propose a binary mask to remove invalid and dynamic points. To our knowledge, this is the first transformer-based LiDAR odometry network. The experiment results on the KITTI odometry dataset show that our average rotation and translation RMSE achieves 0.500°/100m and 0.993% respectively. The performance of our network surpasses all recent learning-based methods and even outperforms LOAM on most evaluation sequences.Codes will be released on https://github.com/IRMVLab/TransLO.

show abstract

3-D Scene Flow Estimation on Pseudo-LiDAR: Bridging the Gap on Estimating Point Motion

Chaokang

Wang

Miao

et al. 2023

IEEE Trans. Ind. Inf.

View full text Add to dashboard Cite

3D scene flow characterizes how the points at the current time flow to the next time in the 3D Euclidean space, which possesses the capacity to infer autonomously the non-rigid motion of all objects in the scene. The previous methods for estimating scene flow from images have limitations, which split the holistic nature of 3D scene flow by estimating optical flow and disparity separately. Learning 3D scene flow from point clouds also faces the difficulties of the gap between synthesized and real data and the sparsity of LiDAR point clouds. In this paper, the generated dense depth map is utilized to obtain explicit 3D coordinates, which achieves direct learning of 3D scene flow from 2D images. The stability of the predicted scene flow is improved by introducing the dense nature of 2D pixels into the 3D space. Outliers in the generated 3D point cloud are removed by statistical methods to weaken the impact of noisy points on the 3D scene flow estimation task. Disparity consistency loss is proposed to achieve more effective unsupervised learning of 3D scene flow. The proposed method of self-supervised learning of 3D scene flow on real-world images is compared with a variety of methods for learning on the synthesized dataset and learning on LiDAR point clouds. The comparisons of multiple scene flow metrics are shown to demonstrate the effectiveness and superiority of introducing pseudo-LiDAR point cloud to scene flow estimation.

show abstract

SFGAN: Unsupervised Generative Adversarial Learning of 3D Scene Flow from the 3D Scene Self

Wang

Chaokang

Shen

et al. 2021

Preprint

View full text Add to dashboard Cite

3D scene flow presents the 3D motion of each point in the 3D space, which forms the fundamental 3D motion perception for autonomous driving and server robots. Although the RGBD camera or LiDAR capture discrete 3D points in space, the objects and motions usually are continuous in the macro world. That is, the objects keep themselves consistent as they flow from the current frame to the next frame. Based on this insight, the Generative Adversarial Networks (GAN) is utilized to self-learn 3D scene flow with no need for ground truth. The fake point cloud of the second frame is synthesized from the predicted scene flow and the point cloud of the first frame. The adversarial training of the generator and discriminator is realized through synthesizing indistinguishable fake point cloud and discriminating the real point cloud and the synthesized fake point cloud. The experiments on KITTI scene flow dataset show that our method realizes promising results without ground truth. Just like a human observing a real-world scene, the proposed approach is capable of determining the consistency of the scene at different moments in spite of the exact flow value of each point is unknown in advance. Corresponding author(s) Email: wanghesheng@sjtu.edu.cn

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jiang, Chaokang

SFGAN: Unsupervised Generative Adversarial Learning of 3D Scene Flow from the 3D Scene Self

TransLO: A Window-Based Masked Point Transformer Framework for Large-Scale LiDAR Odometry

3-D Scene Flow Estimation on Pseudo-LiDAR: Bridging the Gap on Estimating Point Motion

SFGAN: Unsupervised Generative Adversarial Learning of 3D Scene Flow from the 3D Scene Self

Contact Info

Product

Resources

About