2021
DOI: 10.1109/tmi.2021.3057884
|View full text |Cite
|
Sign up to set email alerts
|

Image Compositing for Segmentation of Surgical Tools Without Manual Annotations

Abstract: Producing manual, pixel-accurate, image segmentation labels is tedious and time-consuming. This is often a rate-limiting factor when large amounts of labeled images are required, such as for training deep convolutional networks for instrument-background segmentation in surgical scenes. No large datasets comparable to industry standards in the computer vision community are available for this task. To circumvent this problem, we propose to automate the creation of a realistic training dataset by exploiting techn… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 36 publications
(12 citation statements)
references
References 41 publications
0
12
0
Order By: Relevance
“…A downside of using semantic segmentation in comparison to a detector is the increased annotation time required to build a suitable training set. However, recent advances to reduce the number of contour annotations needed to achieve the segmentation such as ( Vardazaryan et al, 2018 ; Fuentes-Hurtado et al, 2019 ; Garcia-Peraza-Herrera et al, 2021 ) greatly mitigate this drawback.…”
Section: Markerless Instrument Localizationmentioning
confidence: 99%
See 1 more Smart Citation
“…A downside of using semantic segmentation in comparison to a detector is the increased annotation time required to build a suitable training set. However, recent advances to reduce the number of contour annotations needed to achieve the segmentation such as ( Vardazaryan et al, 2018 ; Fuentes-Hurtado et al, 2019 ; Garcia-Peraza-Herrera et al, 2021 ) greatly mitigate this drawback.…”
Section: Markerless Instrument Localizationmentioning
confidence: 99%
“…Within our proposed platform, we introduce a novel tooltip localization method based on a hybrid mixture of deep learning and classical computer vision. In contrast to other tool localization methods in the literature, the proposed approach does not require manual annotations of the tooltips, but relies on tool segmentation, which is advantageous as the manual annotation effort could be trivially waived employing methods such as that recently proposed in Garcia-Peraza-Herrera et al (2021) . This vision pipeline was individually validated and the proposed tooltip localization method was able to detect tips in 84.46% of the frames.…”
Section: Introductionmentioning
confidence: 99%
“…In this work, we proposed a DL-based framework to enhance the visibility of clinical needles with PA imaging for guiding minimally invasive procedures. As clinical needles have relatively simple geometries whilst background biological tissues such as blood vessels are complex, as opposed to using purely synthetic data [46] , [47] , [48] , [49] , a hybrid method was proposed for generating semi-synthetic datasets [50] . The DL model was trained and validated using such semi-synthetic datasets and blind to the test data obtained from tissue-mimicking phantoms, ex vivo tissue and human fingers in vivo .…”
Section: Introductionmentioning
confidence: 99%
“…However, despite the unprecedented results provided by Deep Learning, the problem is still far from being solved for real-world applications: current state-of-the-art Deep Learning approaches rely heavily on manual annotations, which are expensive to obtain at a scale large-enough to allow generalization to real-world scenarios. Alternatives to standard in-house annotate & train pipelines have been proposed, trying to address the annotation problem by cutting the cost of labels, for example by acquiring them through crowd-sourcing platforms (Maier-Hein et al (2016)) or by generating semi-synthetic datasets with automatically obtained labels (Garcia-Peraza-Herrera et al (2021)). General object segmentation has been tackled in an unsupervised way when video data are available, such as in Video Object Segmentation (VOS), mainly by leveraging the hypothesis of incoherent background motion, uncorrelated with the foreground (Yang et al (2019a)).…”
Section: Introductionmentioning
confidence: 99%