M-FUSE: Multi-frame Fusion for Scene Flow Estimation

Mehl, Lukas; Jahedi, Azin; Schmalfuß, Jenny; Bruhn, Andrés

doi:10.1109/wacv56688.2023.00206

Cited by 12 publications

(13 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, deep learning has demonstrated powerful capabilities in end-to-end learning of scene flow estimation from stereo inputs [24,32,41]. Additionally, approaches that leverage pre-existing 3D structure through inputs of RGB-D sequences [31,39,45,33] or Lidar points [28,56,38,55,12,11,52] have also been proposed for various scenarios. Monocular scene flow.…”

Section: Related Workmentioning

confidence: 99%

“…To enable self-supervised training, the estimated depth D 1 of the first image and the SE3 motion field T 1→2 are first converted into the scene flow representation (u, v, ∆D) with known camera intrinsics [33], where (u, v) denotes the standard optical flow F 1→2 , and ∆D denotes the depth change registered to the first frame I 1 . We denote D 1 = D 1 +∆D, which represents the transformed depth map registered to the first frame.…”

Section: Self-supervised Lossmentioning

confidence: 99%

“…based on various input modalities, including stereo images [3,24,32,41,51,40], RGB-D pairs [31,39,45,33], or Lidar points [28,18,54,56,38,55,12,7,11,52]. These methods, however, either require strict sensor calibrations (e.g., stereo-based), or expensive devices (e.g., RGB-D or Lidar-based) for achieving satisfactory performance, which restricts their widespread applications.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Self-Supervised Ego-Motion Estimation Based on Multi-Layer Fusion of RGB and Inferred Depth

Jiang

Taira

Miyashita

et al. 2022

2022 International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

Self-supervised monocular scene flow estimation, aiming to understand both 3D structures and 3D motions from two temporally consecutive monocular images, has received increasing attention for its simple and economical sensor setup. However, the accuracy of current methods suffers from the bottleneck of less-efficient network architecture and lack of motion rigidity for regularization. In this paper, we propose a superior model named EMR-MSF by borrowing the advantages of network architecture design under the scope of supervised learning. We further impose explicit and robust geometric constraints with an elaborately constructed ego-motion aggregation module where a rigidity soft mask is proposed to filter out dynamic regions for stable ego-motion estimation using static regions. Moreover, we propose a motion consistency loss along with a mask regularization loss to fully exploit static regions. Several efficient training strategies are integrated including a gradient detachment technique and an enhanced view synthesis process for better performance. Our proposed method outperforms the previous self-supervised works by a large margin and catches up to the performance of supervised methods. On the KITTI scene flow benchmark, our approach improves the SF-all metric of the state-of-theart self-supervised monocular method by 44% and demonstrates superior performance across sub-tasks including depth and visual odometry, amongst other self-supervised single-task or multi-task methods.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Self-supervised Lossmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Self-Supervised Ego-Motion Estimation Based on Multi-Layer Fusion of RGB and Inferred Depth

Jiang

Taira

Miyashita

et al. 2022

2022 International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

show abstract

“…However, none of these studies exploit the useful temporal information from previous point cloud frames. Extensive studies on optical flow estimation [16], [20], [22], [42], [50], [52], [71], [86] and (a) have shown that scene flow in consecutive frames are similar to each other (i.e., the upper left color wheel represents the flow magnitude and direction). To this end, an intuitive approach for exploiting temporal information, namely Joint, is to force a single FNSF to jointly estimate the previous flow (t-1 → t) and the current flow (t → t+1).…”

Section: Introductionmentioning

confidence: 99%

“…lizing such valuable temporal information for improving the two-frame point cloud scene flow estimations. Such a gap is particularly unexpected, because the extensive body of research in optical flow estimation [20], [22], [42], [50], [52], [71], [86] have shown the importance of temporal information from previous frames, even amidst rapid motion changes in optical flow. For instance, as illustrated in Figure 1(a), it is evident that flows between consecutive frames bear a significant resemblance to each other, underscoring the potential benefits of integrating temporal insights into scene flow estimation for two-frame point clouds.…”

Section: Introductionmentioning

confidence: 99%

Wavelet-based self-supervised learning for multi-scene image fusion

Liu

Qiao

et al. 2022

Neural Comput & Applic

View full text Add to dashboard Cite

Neural Scene Flow Prior (NSFP) and Fast Neural Scene Flow (FNSF) have shown remarkable adaptability in the context of large out-of-distribution autonomous driving. Despite their success, the underlying reasons for their astonishing generalization capabilities remain unclear. Our research addresses this gap by examining the generalization capabilities of NSFP through the lens of uniform stability, revealing that its performance is inversely proportional to the number of input point clouds. This finding sheds light on NSFP's effectiveness in handling large-scale point cloud scene flow estimation tasks. Motivated by such theoretical insights, we further explore the improvement of scene flow estimation by leveraging historical point clouds across multiple frames, which inherently increases the number of point clouds. Consequently, we propose a simple and effective method for multi-frame point cloud scene flow estimation, along with a theoretical evaluation of its generalization abilities. Our analysis confirms that the proposed method maintains a limited generalization error, suggesting that adding multiple frames to the scene flow optimization process does not detract from its generalizability. Extensive experimental results on large-scale autonomous driving Waymo Open and Argoverse lidar datasets demonstrate that the proposed method achieves state-of-the-art performance.

show abstract

GloFP-MSF: monocular scene flow estimation with global feature perception

Xiang,

Cui,

Wang

et al. 2024

Multimedia Systems

View full text Add to dashboard Cite

Monocular scene flow estimation is a task that allows us to obtain 3D structure and 3D motion from consecutive monocular images. Previous monocular scene flow usually focused on the enhancement of image features and motion features directly while neglecting the utilization of motion features and image features in the decoder, which are equally crucial for accurate scene flow estimation. In this paper, we propose a global feature perception module (GFPM) based on cross-covariance attention and applie it to decoder, which enables the decoder to utilize the motion features and image features of the current layer as well as the coarse estimation result of the scene flow of the previous layer effectively, thus enhancing the decoder's recovery of 3D motion information. In addition, we also propose a parallel architecture of self-attention and convolution (PCSA) for feature extraction, which can enhance the global expression ability of extracted image features. Our proposed method demonstrates remarkable performance on the KITTI 2015 dataset, achieving a relative improvement of 17.6% compared to the baseline approach. Compared to other recent methods, the proposed model achieves competitive results.

show abstract

M-FUSE: Multi-frame Fusion for Scene Flow Estimation

Cited by 12 publications

References 38 publications

Self-Supervised Ego-Motion Estimation Based on Multi-Layer Fusion of RGB and Inferred Depth

Self-Supervised Ego-Motion Estimation Based on Multi-Layer Fusion of RGB and Inferred Depth

Wavelet-based self-supervised learning for multi-scene image fusion

GloFP-MSF: monocular scene flow estimation with global feature perception

Contact Info

Product

Resources

About