From Forks to Forceps: A New Framework for Instance Segmentation of Surgical Instruments

Baby, Britty; Thapar, Daksh; Chasmai, Mustafa; Banerjee, Tamajit; Dargan, Kunal; Suri, Ashish; Banerjee, Subhashis; Arora, Chetan

doi:10.1109/wacv56688.2023.00613

Cited by 13 publications

(3 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Comparison methods. We have involved several classical and recent methods, including the vanilla UNet [13], Ter-nausNet [10], MF-TAPNet [12], Islam et al [14], Wang et al [15], ST-MTL [16], S-MTL [17], AP-MTL [18], ISINet [11], TraSeTR [19], and S3Net [20] for surgical binary and instrumentwise segmentation. The ViT-H-based SAM [2] is employed in all our investigations.…”

Section: Type Methodsmentioning

confidence: 99%

Dual Regression-Enhanced Gaze Target Detection in the Wild

Wang

Zhang

Wang

et al. 2024

IEEE Trans. Cybern.

View full text Add to dashboard Cite

Segment Anything Model (SAM) is a foundation model for semantic segmentation and shows excellent generalization capability with the prompts. In this empirical study, we investigate the robustness and zero-shot generalizability of the SAM in the domain of robotic surgery in various settings of (i) prompted vs. unprompted; (ii) bounding box vs. points-based prompt; (iii) generalization under corruptions and perturbations with five severity levels; and (iv) state-of-the-art supervised model vs. SAM. We conduct all the observations with two well-known robotic instrument segmentation datasets of MICCAI EndoVis 2017 and 2018 challenges. Our extensive evaluation results reveal that although SAM shows remarkable zero-shot generalization ability with bounding box prompts, it struggles to segment the whole instrument with point-based prompts and unprompted settings. Furthermore, our qualitative figures demonstrate that the model either failed to predict the parts of the instrument mask (e.g., jaws, wrist) or predicted parts of the instrument as different classes in the scenario of overlapping instruments within the same bounding box or with the point-based prompt. In fact, it is unable to identify instruments in some complex surgical scenarios of blood, reflection, blur, and shade. Additionally, SAM is insufficiently robust to maintain high performance when subjected to various forms of data corruption. Therefore, we can argue that SAM is not ready for downstream surgical tasks without further domain-specific fine-tuning.

show abstract

Section: Type Methodsmentioning

confidence: 99%

Dual Regression-Enhanced Gaze Target Detection in the Wild

Wang

Zhang

Wang

et al. 2024

IEEE Trans. Cybern.

View full text Add to dashboard Cite

show abstract

“…ISINet introduces mask classification to instrument segmentation with Mask-RCNN (González, Bravo-Sánchez, and Arbelaez 2020; He et al 2017). Later, Baby et al (2023) improve its classification performance by designing a specialised classification module. In addition, TraSeTR integrates tracking cues with a track-to-segment transformer (Zhao, Jin, and Heng 2022) and MATIS incorporates temporal consistency with Mask2Former (Ayobi et al 2023;Cheng et al 2022).…”

Section: Related Work Surgical Instrument Segmentationmentioning

confidence: 99%

“…We use the EndoVis2018 (Allan et al 2020) For evaluation, we follow prior research and adopt three segmentation metrics: Challenge IoU (Allan et al 2019), IoU, and mean class IoU (mc IoU) (González, Bravo-Sánchez, and Arbelaez 2020; Baby et al 2023;Ayobi et al 2023). The efficiency of our method is evaluated in terms of training speed, training GPU usage, and inference speed.…”

Section: Datasets and Evaluationmentioning

confidence: 99%

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

Yue,

Zhang,

et al. 2024

AAAI

View full text Add to dashboard Cite

The Segment Anything Model (SAM) is a powerful foundation model that has revolutionised image segmentation. To apply SAM to surgical instrument segmentation, a common approach is to locate precise points or boxes of instruments and then use them as prompts for SAM in a zero-shot manner. However, we observe two problems with this naive pipeline: (1) the domain gap between natural objects and surgical instruments leads to inferior generalisation of SAM; and (2) SAM relies on precise point or box locations for accurate segmentation, requiring either extensive manual guidance or a well-performing specialist detector for prompt preparation, which leads to a complex multi-stage pipeline. To address these problems, we introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to effectively integrate surgical-specific information with SAM’s pre-trained knowledge for improved generalisation. Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes and eliminates the use of explicit prompts for improved robustness and a simpler pipeline. In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning, further enhancing the discrimination of the class prototypes for more accurate class prompting. The results of extensive experiments on both EndoVis2018 and EndoVis2017 datasets demonstrate that SurgicalSAM achieves state-of-the-art performance while only requiring a small number of tunable parameters. The source code is available at https://github.com/wenxi-yue/SurgicalSAM.

show abstract