CataNet: Predicting Remaining Cataract Surgery Duration

Marafioti, Andrés; Hayoz, Michel; Gallardo, Mathias; Neila, Pablo Márquez; Wolf, Sebastián; Zinkernagel, Martin; Sznitman, Raphael

doi:10.1007/978-3-030-87202-1_41

Cited by 14 publications

(6 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Still however, sequences of only 10 frames are trained end to end. CataNet [38] proposes a complex, 4-stage learning process for ResNet-LSTMs to predict surgery duration. In the first two stages, ResNet and LSTM are trained separately, followed by an end-to-end stage and repeated finetuning of the LSTM.…”

Section: Surgical Workflow Analysismentioning

confidence: 99%

“…Nwoye et al [39] justify their 2-stage approach through "fair comparison". And the authors of CataNet [38] deactivate BN in their public code repository 3 but do not discuss this in the paper. Rivoir et al [41] briefly mention BatchNorm's "cheating" to justify their choice of an AlexNet backbone for instrument anticipation.…”

Section: Surgical Workflow Analysismentioning

confidence: 99%

“…[8], and only the temporal model is optimized [1,13,22,25,30,42,54,60]. However, in specialized domains such as surgical video, well-pretrained CNNs may not be available, meaning that the CNN needs to be finetuned, either in a 2-stage [4,16,29,38,39,61] or in an end-to-end (E2E) training setting. The latter seems preferable since visual and temporal features can be learned jointly and in a more complete context, but BN layers in CNN backbones may cause problems.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow Analysis

Rivoir¹,

Funke²,

Speidel³

2022

Preprint

View full text Add to dashboard Cite

Batch Normalization's (BN) unique property of depending on other samples in a batch is known to cause problems in several tasks, including sequential modeling, and has led to the use of alternatives in these fields. In video learning, however, these problems are less studied, despite the ubiquitous use of BN in CNNs for visual feature extraction. We argue that BN's properties create major obstacles for training CNNs and temporal models end to end in video tasks. Yet, end-to-end learning seems preferable in specialized domains such as surgical workflow analysis, which lack well-pretrained feature extractors. While previous work in surgical workflow analysis has avoided BN-related issues through complex, multi-stage learning procedures, we show that even simple, endto-end CNN-LSTMs can outperform the state of the art when CNNs without BN are used. Moreover, we analyze in detail when BN-related issues occur, including a "cheating" phenomenon in surgical anticipation tasks. We hope that a deeper understanding of BN's limitations and a reconsideration of end-to-end approaches can be beneficial for future research in surgical workflow analysis and general video learning.

show abstract

Section: Surgical Workflow Analysismentioning

confidence: 99%

Section: Surgical Workflow Analysismentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow Analysis

Rivoir¹,

Funke²,

Speidel³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…To help train future surgeons and optimize surgical workflows, automated methods that analyze cataract surgery videos have gained significant traction in the last decade. With the prospect of reducing intra-operative and post-operative complications [5], recent methods have included surgical skill assessment [8,26], remaining surgical time estimation [13], irregularity detection [7] or relevance-based compression [6]. In addition, a reliable relevant-instance-segmentation approach is often a prerequisite for a majority of these applications [17].…”

Section: Introductionmentioning

confidence: 99%

DeepPyramid: Enabling Pyramid View and Deformable Pyramid Reception for Semantic Segmentation in Cataract Surgery Videos

Ghamsarian¹,

Taschwer²,

Sznitman³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Semantic segmentation in cataract surgery has a wide range of applications contributing to surgical outcome enhancement and clinical risk reduction. However, the varying issues in segmenting the different relevant structures in these surgeries make the designation of a unique network quite challenging. This paper proposes a semantic segmentation network, termed DeepPyramid, that can deal with these challenges using three novelties: (1) a Pyramid View Fusion module which provides a varying-angle global view of the surrounding region centering at each pixel position in the input convolutional feature map; (2) a Deformable Pyramid Reception module which enables a wide deformable receptive field that can adapt to geometric transformations in the object of interest; and (3) a dedicated Pyramid Loss that adaptively supervises multi-scale semantic feature maps. Combined, we show that these modules can effectively boost semantic segmentation performance, especially in the case of transparency, deformability, scalability, and blunt edges in objects. We demonstrate that our approach performs at a state-of-the-art level and outperforms a number of existing methods with a large margin (3.66% overall improvement in intersection over union compared to the best rival approach).

show abstract

“…Currently, various works have investigated anticipation of surgical workflow [2]- [8]. Most works are solely based on pixel-level visual features extracted by ResNet [9] and similar backbones, and learn these features directly with temporal models [10].…”

Section: Introductionmentioning

confidence: 99%

Towards Graph Representation Learning Based Surgical Workflow Anticipation

Zhang¹,

Moubayed²,

Shum³

2022

Preprint

View full text Add to dashboard Cite

Surgical workflow anticipation can give predictions on what steps to conduct or what instruments to use next, which is an essential part of the computer-assisted intervention system for surgery, e.g. workflow reasoning in robotic surgery. However, current approaches are limited to their insufficient expressive power for relationships between instruments. Hence, we propose a graph representation learning framework to comprehensively represent instrument motions in the surgical workflow anticipation problem. In our proposed graph representation, we maps the bounding box information of instruments to the graph nodes in the consecutive frames and build inter-frame/inter-instrument graph edges to represent the trajectory and interaction of the instruments over time. This design enhances the ability of our network on modeling both the spatial and temporal patterns of surgical instruments and their interactions. In addition, we design a multi-horizon learning strategy to balance the understanding of various horizons indifferent anticipation tasks, which significantly improves the model performance in anticipation with various horizons. Experiments on the Cholec80 dataset demonstrate the performance of our proposed method can exceed the state-of-the-art method based on richer backbones, especially in instrument anticipation (1.27 v.s. 1.48 for inMAE; 1.48 v.s. 2.68 for eMAE). To the best of our knowledge, we are the first to introduce a spatial-temporal graph representation into surgical workflow anticipation.

show abstract

CataNet: Predicting Remaining Cataract Surgery Duration

Cited by 14 publications

References 21 publications

On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow Analysis

On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow Analysis

DeepPyramid: Enabling Pyramid View and Deformable Pyramid Reception for Semantic Segmentation in Cataract Surgery Videos

Towards Graph Representation Learning Based Surgical Workflow Anticipation

Contact Info

Product

Resources

About