Surgical gesture recognition with time delay neural network based on kinematic data

Menegozzo, Giovanni; DallrAlba, Diego; Zandonà, Chiara; Fiorini, Paolo

doi:10.1109/ismr.2019.8710178

“…While pooling operations help to increase the temporal receptive field of a network, they are also responsible for partial loss of fine-grained information and less precise identification of the gesture boundaries. Stacking multiple layers of dilated convolution with increasing dilation factor was proposed as alternative strategy to model long range temporal dependencies between surgical gestures [49]. Gesture predictions can be further refined in a multi-task, multi-stage framework where each stack of dilated convolutions is applied to the output of the previous stage, and the whole system is trained on the sum of all stage losses, as well as on the auxiliary task of surgical skill score prediction [50].…”

Section: A Convolutional Neural Networkmentioning

confidence: 99%

Gesture Recognition in Robotic Surgery: A Review

Amsterdam

¹

,

Clarkson²,

Stoyanov³

2021

IEEE Trans. Biomed. Eng.

View full text Add to dashboard Cite

Surgical activity recognition is a fundamental step in computer-assisted interventions. This paper reviews the state-of-the-art in methods for automatic recognition of fine-grained gestures in robotic surgery focusing on recent data-driven approaches and outlines the open questions and future research directions. Methods: An article search was performed on 5 bibliographic databases with combinations of the following search terms: robotic, robot-assisted, JIGSAWS, surgery, surgical, gesture, fine-grained, surgeme, action, trajectory, segmentation, recognition, parsing. Selected articles were classified based on the level of supervision required for training and divided into different groups representing major frameworks for time series analysis and data modelling. Results: A total of 52 articles were reviewed. The research field is showing rapid expansion, with the majority of articles published in the last 4 years. Deep-learning-based temporal models with discriminative feature extraction and multi-modal data integration have demonstrated promising results on small surgical datasets. Currently, unsupervised methods perform significantly less well than the supervised approaches. Conclusion: The development of large and diverse open-source datasets of annotated demonstrations is essential for development and validation of robust solutions for surgical gesture recognition. While new strategies for discriminative feature extraction and knowledge transfer, or unsupervised and semi-supervised approaches, can mitigate the need for data and labels, they have not yet been demonstrated to achieve comparable performance. Important future research directions include detection and forecast of gesture-specific errors and anomalies. Significance: This paper is a comprehensive and structured analysis of surgical gesture recognition methods aiming to summarize the status of this rapidly evolving field.

show abstract

“…The recognition and segmentation of the robot's current action is one of the main pillars of the surgical state estimation process. Many models have been developed for the segmentation and recognition of finegrained surgical actions that last for a few seconds, such as cutting [5][6][7][8], as well as surgical phases that last for up to 10 minutes, such as bladder dissection [9][10][11]. The recognition of fine-grained surgical states is particularly challenging due to their short duration and frequent state transitions.…”

Section: Introductionmentioning

confidence: 99%

“…The Transition State Clustering (TSC) and Gaussian Mixture Model methods provide unsupervised or weakly-supervised methods for surgical trajectory segmentation [17,18]. More recently, deep learning methods have come to define the state-of-the-art, such as Temporal Convolutional Networks (TCN) [19], Time Delay Neural Network (TDNN) [7], and Long-Short Term Memory (LSTM) [6,20]. Instead of using robot kinematics data, vision-based methods have been developed based on Convolutional Neural Networks (CNN).…”

Section: Introductionmentioning

confidence: 99%

Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources

Qin

¹

,

Pedram

²

,

Feyzabadi

³

et al. 2020

2020 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The objective of this work is to estimate the current state of the surgical task based on the actions performed or events occurred as the task progresses. We propose Fusion-KVE, a unified surgical state estimation model that incorporates multiple data sources including the Kinematics, Vision, and system Events. Additionally, we examine the strengths and weaknesses of different state estimation models in segmenting states with different representative features or levels of granularity. We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), as well as a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging, created using the da Vinci R Xi surgical system. Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models in both JIGSAWS suturing dataset and our RIOUS dataset.

show abstract

“…The recognition and segmentation of the robot's current action is one of the main pillars of the surgical state estimation process. Many models have been developed for the segmentation and recognition of finegrained surgical actions that last for a few seconds, such as cutting [5][6][7][8], as well as surgical phases that last for up to 10 minutes, such as bladder dissection [9][10][11]. The recognition of fine-grained surgical states is particularly challenging due to their short duration and frequent state transitions.…”

Section: Introductionmentioning

confidence: 99%

Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources

Qin

¹

,

Pedram

²

,

Feyzabadi

³

et al. 2020

Preprint

View full text Add to dashboard Cite

Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The objective of this work is to estimate the current state of the surgical task based on the actions performed or events occurred as the task progresses. We propose Fusion-KVE, a unified surgical state estimation model that incorporates multiple data sources including the Kinematics, Vision, and system Events. Additionally, we examine the strengths and weaknesses of different state estimation models in segmenting states with different representative features or levels of granularity. We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), as well as a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging, created using the da Vinci R Xi surgical system. Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models in both JIGSAWS suturing dataset and our RIOUS dataset.

show abstract

Surgical gesture recognition with time delay neural network based on kinematic data

Cited by 14 publications

References 17 publications

Gesture Recognition in Robotic Surgery: A Review

Gesture Recognition in Robotic Surgery: A Review

Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources

Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources

Contact Info

Product

Resources

About