Boosting Monocular Depth Estimation with Lightweight 3D Point Fusion

Huynh, Lam; Nguyen-Ha, Phong; Matas, Jiřı́; Rahtu, Esa; Heikkilä, Janne

doi:10.1109/iccv48922.2021.01253

Cited by 16 publications

(3 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lam et al [ 29 ] adopted a fully convolutional framework. After inputting an RGB image and sparse 3D point clouds to form a sparse depth map and taking the 3D point clouds as depth constraints onto the images, they created an RGB-D image.…”

Section: Related Workmentioning

confidence: 99%

FCNet: Stereo 3D Object Detection with Feature Correlation Networks

Liu

Chen

et al. 2022

Entropy

View full text Add to dashboard Cite

Deep-learning techniques have significantly improved object detection performance, especially with binocular images in 3D scenarios. To supervise the depth information in stereo 3D object detection, reconstructing the 3D dense depth of LiDAR point clouds causes higher computational costs and lower inference speed. After exploring the intrinsic relationship between the implicit depth information and semantic texture features of the binocular images, we propose an efficient and accurate 3D object detection algorithm, FCNet, in stereo images. First, we construct a multi-scale cost–volume containing implicit depth information using the normalized dot-product by generating multi-scale feature maps from the input stereo images. Secondly, the variant attention model enhances its global and local description, and the sparse region monitors the depth loss deep regression. Thirdly, for balancing the channel information preservation of the re-fused left–right feature maps and computational burden, a reweighting strategy is employed to enhance the feature correlation in merging the last-layer features of binocular images. Extensive experiment results on the challenging KITTI benchmark demonstrate that the proposed algorithm achieves better performance, including a lower computational cost and higher inference speed in 3D object detection.

show abstract

Section: Related Workmentioning

confidence: 99%

FCNet: Stereo 3D Object Detection with Feature Correlation Networks

Liu

Chen

et al. 2022

Entropy

View full text Add to dashboard Cite

show abstract

“…In a more general 3D reconstruction pipeline, well-triangulated 2D features and corresponding 3D keypoints are a by-product of most SFM algorithms [29]. The PatchMatch framework is flexible enough to support any kind of initial solution, without requiring to design complex neural architectures to extract representations from sparse input data [30].…”

Section: B Keypoint-based Initializationmentioning

confidence: 99%

Revisiting PatchMatch Multi-View Stereo for Urban 3D Reconstruction

Marco

Paolo

Medici

et al. 2022

2022 IEEE Intelligent Vehicles Symposium (IV)

View full text Add to dashboard Cite

In this paper, a complete pipeline for imagebased 3D reconstruction of urban scenarios is proposed, based on PatchMatch Multi-View Stereo (MVS). Input images are firstly fed into an off-the-shelf visual SLAM system to extract camera poses and sparse keypoints, which are used to initialize PatchMatch optimization. Then, pixelwise depths and normals are iteratively computed in a multi-scale framework with a novel depth-normal consistency loss term and a global refinement algorithm to balance the inherently local nature of PatchMatch. Finally, a large-scale point cloud is generated by back-projecting multi-view consistent estimates in 3D. The proposed approach is carefully evaluated against both classical MVS algorithms and monocular depth networks on the KITTI dataset, showing state of the art performances.

show abstract

“…Monocular Depth Estimation: Monocular depth estimation has been recently shifted to improving neural network architectures and optimizing methods [59,82,85,35,10,48,65,87], integrating hierarchical features [85,57,65], leveraging camera motion between pairs of frames [97,51,41,72], taking advantage of planner guidance [57,47] and 3D geometric constraints [62,21,61,46]. More recently, audio [81,38,69] has been introduced to help for estimating depth.…”

Section: Related Workmentioning

confidence: 99%

Beyond Visual Field of View: Perceiving 3D Environment with Echoes and Vision

Li¹,

Rahtu²,

Zhao³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

This paper focuses on perceiving and navigating 3D environments using echoes and RGB image. In particular, we perform depth estimation by fusing RGB image with echoes, received from multiple orientations. Unlike previous works, we go beyond the field of view of the RGB and estimate dense depth maps for substantially larger parts of the environment. We show that the echoes provide holistic and in-expensive information about the 3D structures complementing the RGB image. Moreover, we study how echoes and the wide field-of-view depth maps can be utilised in robot navigation. We compare the proposed methods against recent baselines using two sets of challenging realistic 3D environments: Replica and Matterport3D. The implementation and pre-trained models will be made publicly available.

show abstract

Boosting Monocular Depth Estimation with Lightweight 3D Point Fusion

Cited by 16 publications

References 32 publications

FCNet: Stereo 3D Object Detection with Feature Correlation Networks

FCNet: Stereo 3D Object Detection with Feature Correlation Networks

Revisiting PatchMatch Multi-View Stereo for Urban 3D Reconstruction

Beyond Visual Field of View: Perceiving 3D Environment with Echoes and Vision

Contact Info

Product

Resources

About