An Application-Dependent Framework for the Recognition of High-Level Surgical Tasks in the OR

103

Abstract. Automatic surgical gesture segmentation and recognition can provide useful feedback for surgical training in robotic surgery. Most prior work in this field relies on the robot's kinematic data. Although recent work [1,2] shows that the robot's video data can be equally effective for surgical gesture recognition, the segmentation of the video into gestures is assumed to be known. In this paper, we propose a framework for joint segmentation and recognition of surgical gestures from kinematic and video data. Unlike prior work that relies on either frame-level kinematic cues, or segment-level kinematic or video cues, our approach exploits both cues by using a combined Markov/semi-Markov conditional random field (MsM-CRF) model. Our experiments show that the proposed model improves over a Markov or semi-Markov CRF when using video data alone, gives results that are comparable to state-of-the-art methods on kinematic data alone, and improves over state-of-the-art methods when combining kinematic and video data.

Section: Introductionmentioning

confidence: 99%

Surgical Gesture Segmentation and Recognition

Tao

Zappella

Hager

et al. 2013

103

“…In this paper, we focus on predicting the next surgical actions from the lowlevel information that can be captured during the surgery (e.g., [3,8,9]). We use the series of surgical activities performed by the surgeon to represent the course of the surgery.…”

Section: Introductionmentioning

confidence: 99%

“…More and more Operating Rooms (ORs) are getting equipped systems with sensing devices that can capture the surgeon's activities and environment. For example, using cameras in pituitary surgery, both the phases of the surgery [2] and the low-level surgical tasks [3] can be detected and recorded automatically. The task performed by the surgeon can also be automatically inferred by combining RFID chips on instruments (for identification) with accelerometers [4].…”

Section: Introductionmentioning

confidence: 99%

Optimal Sub-Sequence Matching for the Automatic Prediction of Surgical Tasks

Forestier

Petitjean

Riffaud

et al. 2015

Self Cite

International audienceSurgery is one of the riskiest and most important medical acts that is performed today. The desires to improve patient outcomes, surgeon training, and also to reduce the costs of surgery, have motivated surgeons to equip their Operating Rooms with sensors that describe the surgical intervention. The richness and complexity of the data that is collected calls for new machine learning methods to support pre-, peri- and post-surgery (before, during and after). This paper introduces a new method for the prediction of the next task that the surgeon is going to perform during the surgery (peri). Our method bases its prediction on the optimal matching of the current surgery to a set of pre-recorded surgeries. We assess our method on a set of neurosurgeries (lumbar disc herniation removal) and show that our method outperforms the state of the art by providing a prediction (of the next task that is going to be performed by the surgeon) more than 85% of the time with a 95% accurac

“…using laparoscopic videos. Other works on video data analysis [12,13] focus on recognizing the phases of a surgery by also observing surgeons and nurses in the operating room. To the best of our knowledge, the only existing work on automatic skill and surgical gesture (rather than coarse phases as in [10][11][12][13]) classification from video is [14], which uses basic visual cues based on optical flow and concludes that kinematic-based approaches are generally more accurate.…”

Section: Introductionmentioning

confidence: 99%

Surgical Gesture Classification from Video Data

Haro

Zappella

Vidal

2012

Abstract. Much of the existing work on automatic classification of gestures and skill in robotic surgery is based on kinematic and dynamic cues, such as time to completion, speed, forces, torque, or robot trajectories. In this paper we show that in a typical surgical training setup, video data can be equally discriminative. To that end, we propose and evaluate three approaches to surgical gesture classification from video. In the first one, we model each video clip from each surgical gesture as the output of a linear dynamical system (LDS) and use metrics in the space of LDSs to classify new video clips. In the second one, we use spatio-temporal features extracted from each video clip to learn a dictionary of spatio-temporal words and use a bag-of-features (BoF) approach to classify new video clips. In the third approach, we use multiple kernel learning to combine the LDS and BoF approaches. Our experiments show that methods based on video data perform equally well as the state-of-the-art approaches based on kinematic data.