2022
DOI: 10.48550/arxiv.2205.15034
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SMUDLP: Self-Teaching Multi-Frame Unsupervised Endoscopic Depth Estimation with Learnable Patchmatch

Abstract: Unsupervised monocular trained depth estimation models make use of adjacent frames as a supervisory signal during the training phase. However, temporally correlated frames are also available at inference time for many clinical applications, e.g., surgical navigation. The vast majority of monocular systems do not exploit this valuable signal that could be deployed to enhance the depth estimates. Those that do, achieve only limited gains due to the unique challenges in endoscopic scenes, such as low and homogene… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 30 publications
0
2
0
Order By: Relevance
“…The self-supervised method (Watson et al 2021) removes the requirement for human annotations and depth supervision, and it proposes a teacher-student training architecture to encourage the network to ignore unreliable regions in MVS cost volume. (Feng et al 2022) further improves the depth accuracy in regions of dynamic objects by disentangling object motions, and (Shao et al 2022) utilizes Deformable Convolution Networks (DCNs (Dai et al 2017)) to enhance the depth estimates in low-texture and homogeneous-texture regions. However, these methods enforce consistency between MVS depth and monocular depth, which underuses the geometric reasoning of MVS.…”
Section: Multi-frame Depth Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…The self-supervised method (Watson et al 2021) removes the requirement for human annotations and depth supervision, and it proposes a teacher-student training architecture to encourage the network to ignore unreliable regions in MVS cost volume. (Feng et al 2022) further improves the depth accuracy in regions of dynamic objects by disentangling object motions, and (Shao et al 2022) utilizes Deformable Convolution Networks (DCNs (Dai et al 2017)) to enhance the depth estimates in low-texture and homogeneous-texture regions. However, these methods enforce consistency between MVS depth and monocular depth, which underuses the geometric reasoning of MVS.…”
Section: Multi-frame Depth Learningmentioning
confidence: 99%
“…However, MVS is still challenged by unsatisfactory reconstructions in real-world scenes with non-Lambertian surfaces, textureless areas, and moving objects (Knapitsch et al 2017;Schöps et al 2017). To tackle these problems, teacherstudent training architectures (Watson et al 2021;Feng et al 2022;Shao et al 2022) are proposed to enforce consistency between monocular depth and MVS depth. However, the consistency pushes MVS depth to mimic monocular depth, which underuses the multi-view geometry, thus the performance of these methods is limited.…”
Section: Introductionmentioning
confidence: 99%