2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.00374
|View full text |Cite
|
Sign up to set email alerts
|

360MonoDepth: High-Resolution 360° Monocular Depth Estimation

Abstract: 360°cameras can capture complete environments in a single shot, which makes 360°imagery alluring in many computer vision tasks. However, monocular depth estimation remains a challenge for 360°data, particularly for high resolutions like 2K (2048×1024) and beyond that are important for novel-view synthesis and virtual reality applications. Current CNN-based methods do not support such high resolutions due to limited GPU memory. In this work, we propose a flexible framework for monocular depth estimation from hi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0
3

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 39 publications
(9 citation statements)
references
References 77 publications
0
6
0
3
Order By: Relevance
“…4 reports the comparison on two panoramic depth completion datasets. We observe that, with the lowest parameters, RigNet++ is still significantly superior to UniFuse [30], HoHoNet [59], GuideNet [63], 360Depth [53], and M 3 PT [71]. For example, compared to M 3 PT that uses additional masked pre-training strategy [19], RigNet++ still achieves 15.1% lower RMSE in average and higher δ i despite their REL metrics are marginally close.…”
Section: Evaluation On Indoor Nyuv2mentioning
confidence: 86%
“…4 reports the comparison on two panoramic depth completion datasets. We observe that, with the lowest parameters, RigNet++ is still significantly superior to UniFuse [30], HoHoNet [59], GuideNet [63], 360Depth [53], and M 3 PT [71]. For example, compared to M 3 PT that uses additional masked pre-training strategy [19], RigNet++ still achieves 15.1% lower RMSE in average and higher δ i despite their REL metrics are marginally close.…”
Section: Evaluation On Indoor Nyuv2mentioning
confidence: 86%
“…It is proved that leveraging transformer architecture in 360°image modeling reduces distortions caused by projection and rotation [6]. For this reason, recent approaches [44,45] including PAVER [60], PanoFormer [48], and Text2Light [4] used the transformer achieving global structural consistency.…”
Section: Related Workmentioning
confidence: 99%
“…Since BiFuse requires a large number of panoramas as well as depth truth values captured by laser sensors due to the high data collection cost of BiFuse, so BiFuse++ proposes a new fusion module and contrast-aware luminosity loss based to reduce the algorithm's need for depth truth and to improve the performance of the two-branch fusion network and the stability of the real video self-training. Since tangential panoramic images produce less distortion than equal-rectangular panoramic images, Rey-Area et al [23] trained an ordinary image depth estimation algorithm [9] by using another panoramic image projection from the tangential panoramic image and reprojected the predicted tangent depth image into the rectangular panoramic image for alignment and fusion. Ai et al [24] proposed a novel solution for monocular panoramic depth estimation, which predicts an ERP format depth map by collaboratively learning the holistic-with-regional depth distributions.…”
Section: Related Workmentioning
confidence: 99%
“…Since tangential panoramic images produce less distortion than equal‐rectangular panoramic images, Rey‐Area et al. [23] trained an ordinary image depth estimation algorithm [9] by using another panoramic image projection from the tangential panoramic image and reprojected the predicted tangent depth image into the rectangular panoramic image for alignment and fusion. Ai et al.…”
Section: Related Workmentioning
confidence: 99%