Domain gap in adapting self-supervised depth estimation methods for stereo-endoscopy

Sharan, Lalith; Burger, Lukas; Kostiuchik, Georgii; Wolf, Ivo; Kretzler, Matthias; Simone, Raffaele De; Engelhardt, Sandy

doi:10.1515/cdbme-2020-0004

Cited by 9 publications

(4 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Besides, due to the nature of data scarcity and data privacy in the surgical domain, it is particularly challenging to acquire intra-operative data in comparison to other imaging modalities. Furthermore, surgical data is highly heterogeneous, due to varying lighting conditions, acquisition angles, and the number and type of objects in the scene [63]. It can indeed be observed that the detector network Det sim is able to better learn the distribution of the simulator domain than the Det or network learns the distribution of the intraoperative domain (mean PPV +14.41, mean TPR +27.27, mean F 1 +0.2450; cf.…”

Section: Discussionmentioning

confidence: 98%

Mutually improved endoscopic image synthesis and landmark detection in unpaired image-to-image translation

Sharan,

Romano,

Koehler

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

The CycleGAN framework allows for unsupervised image-to-image translation of unpaired data. In a scenario of surgical training on a physical surgical simulator, this method can be used to transform endoscopic images of phantoms into images which more closely resemble the intra-operative appearance of the same surgical target structure. This can be viewed as a novel augmented reality approach, which we coined Hyperrealism in previous work. In this use case, it is of paramount importance to display objects like needles, sutures or instruments consistent in both domains while altering the style to a more tissue-like appearance. Segmentation of these objects would allow for a direct transfer, however, contouring of these, partly tiny and thin foreground objects is cumbersome and perhaps inaccurate. Instead, we propose to use landmark detection on the points when sutures pass into the tissue. This objective is directly incorporated into a CycleGAN framework by treating the performance of pre-trained detector models as an additional optimization goal. We show that a task defined on these sparse landmark labels improves consistency of synthesis by the generator network in both domains. Comparing a baseline CycleGAN architecture to our proposed extension (DetCycleGAN), mean precision (PPV) improved by + 61.32, mean sensitivity (TPR) by +37.91, and mean F 1 score by +0.4743. Furthermore, it could be shown that by dataset fusion, generated intraoperative images can be leveraged as additional training data for the detection network itself. The data is released within the scope of the AdaptOR MICCAI Challenge 2021 at https://adaptor2021.github.io/, and code at https: //github.com/Cardio-AI/detcyclegan_pytorch.

show abstract

Section: Discussionmentioning

confidence: 98%

Mutually improved endoscopic image synthesis and landmark detection in unpaired image-to-image translation

Sharan,

Romano,

Koehler

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, this architecture is in disuse for depth learning since GANs produce depth maps that prioritize realism over accuracy. Self-supervised learning is a natural choice for real medical endoscopies to overcome the lack of depth labels on the target domain [37]- [39]. Although depth or stereo sensors are not common for in-vivo procedures, several works are trained with real stereo endoscopies [40]- [42].…”

Section: B Single-view Depth Learningmentioning

confidence: 99%

On the Uncertain Single-View Depths in Endoscopies

Rodríguez-Puigvert¹,

Recasens²,

Civera³

et al. 2021

Preprint

View full text Add to dashboard Cite

Estimating depth from endoscopic images is a pre-requisite for a wide set of AI-assisted technologies, namely accurate localization, measurement of tumors, or identification of non-inspected areas. As the domain specificity of colonoscopies -a deformable low-texture environment with fluids, poor lighting conditions and abrupt sensor motions-pose challenges to multi-view approaches, single-view depth learning stands out as a promising line of research. In this paper, we explore for the first time Bayesian deep networks for single-view depth estimation in colonoscopies. Their uncertainty quantification offers great potential for such a critical application area. Our specific contribution is two-fold: 1) an exhaustive analysis of Bayesian deep networks for depth estimation in three different datasets, highlighting challenges and conclusions regarding synthetic-to-real domain changes and supervised vs. self-supervised methods; and 2) a novel teacherstudent approach to deep depth learning that takes into account the teacher uncertainty.

show abstract

“…Deep learning has shown impressive results in complex computer vision tasks such as segmentation, depth perception, and pose estimation [7,26,30]. These approaches work well on feature rich datasets like road scenes but perform poorly for environments such as medical endoscopy as shown in [24]. This is because of poor texture information and the lack of photometric constancy between frames in endoscopy due to the joint motion between the camera and light source [14].…”

Section: Related Workmentioning

confidence: 99%

3D Semantic Mapping from Arthroscopy using Out-of-distribution Pose and Depth and In-distribution Segmentation Training

Jonmohamadi¹,

Ali²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Minimally invasive surgery (MIS) has many documented advantages, but the surgeon's limited visual contact with the scene can be problematic. Hence, systems that can help surgeons navigate, such as a method that can produce a 3D semantic map, can compensate for the limitation above. In theory, we can borrow 3D semantic mapping techniques developed for robotics, but this requires finding solutions to the following challenges in MIS: 1) semantic segmentation, 2) depth estimation, and 3) pose estimation. In this paper, we propose the first 3D semantic mapping system from knee arthroscopy that solves the three challenges above. Using out-of-distribution non-human datasets, where pose could be labeled, we jointly train depth+pose estimators using selfsupervised and supervised losses. Using an in-distribution human knee dataset, we train a fully-supervised semantic segmentation system to label arthroscopic image pixels into femur, ACL, and meniscus. Taking testing images from human knees, we combine the results from these two systems to automatically create 3D semantic maps of the human knee. The result of this work opens the pathway to the generation of intraoperative 3D semantic mapping, registration with pre-operative data, and robotic-assisted arthroscopy.

show abstract

Domain gap in adapting self-supervised depth estimation methods for stereo-endoscopy

Cited by 9 publications

References 12 publications

Mutually improved endoscopic image synthesis and landmark detection in unpaired image-to-image translation

Mutually improved endoscopic image synthesis and landmark detection in unpaired image-to-image translation

On the Uncertain Single-View Depths in Endoscopies

3D Semantic Mapping from Arthroscopy using Out-of-distribution Pose and Depth and In-distribution Segmentation Training

Contact Info

Product

Resources

About