Abstract-Real-time tool segmentation from endoscopic videos is an essential part of many computer-assisted robotic surgical systems and of critical importance in robotic surgical data science. We propose two novel deep learning architectures for automatic segmentation of non-rigid surgical instruments. Both methods take advantage of automated deep-learningbased multi-scale feature extraction while trying to maintain an accurate segmentation quality at all resolutions. The two proposed methods encode the multi-scale constraint inside the network architecture. The first proposed architecture enforces it by cascaded aggregation of predictions and the second proposed network does it by means of a holistically-nested architecture where the loss at each scale is taken into account for the optimization process. As the proposed methods are for realtime semantic labeling, both present a reduced number of parameters. We propose the use of parametric rectified linear units for semantic labeling in these small architectures to increase the regularization of the network while maintaining the segmentation accuracy. We compare the proposed architectures against state-of-the-art fully convolutional networks. We validate our methods using existing benchmark datasets, including ex vivo cases with phantom tissue and different robotic surgical instruments present in the scene. Our results show a statistically significant improved Dice Similarity Coefficient over previous instrument segmentation methods. We analyze our design choices and discuss the key drivers for improving accuracy.
Abstract. Real-time tool segmentation is an essential component in computer-assisted surgical systems. We propose a novel real-time automatic method based on Fully Convolutional Networks (FCN) and optical flow tracking. Our method exploits the ability of deep neural networks to produce accurate segmentations of highly deformable parts along with the high speed of optical flow. Furthermore, the pre-trained FCN can be fine-tuned on a small amount of medical images without the need to hand-craft features. We validated our method using existing and new benchmark datasets, covering both ex vivo and in vivo real clinical cases where different surgical instruments are employed. Two versions of the method are presented, non-real-time and real-time. The former, using only deep learning, achieves a balanced accuracy of 89.6% on a real clinical dataset, outperforming the (non-real-time) state of the art by 3.8 percentage points. The latter, a combination of deep learning with optical flow tracking, yields an average balanced accuracy of 78.2% across all the validated datasets.
Purpose Fetoscopic laser photocoagulation is a minimally invasive surgical procedure used to treat twin-to-twin transfusion syndrome (TTTS), which involves localization and ablation of abnormal vascular connections on the placenta to regulate the blood flow in both fetuses. This procedure is particularly challenging due to the limited field of view, poor visibility, occasional bleeding, and poor image quality. Fetoscopic mosaicking can help in creating an image with the expanded field of view which could facilitate the clinicians during the TTTS procedure. Methods We propose a deep learning-based mosaicking framework for diverse fetoscopic videos captured from different settings such as simulation, phantoms, ex vivo, and in vivo environments. The proposed mosaicking framework extends an existing deep image homography model to handle video data by introducing the controlled data generation and consistent homography estimation modules. Training is performed on a small subset of fetoscopic images which are independent of the testing videos. Results We perform both quantitative and qualitative evaluations on 5 diverse fetoscopic videos (2400 frames) that captured different environments. To demonstrate the robustness of the proposed framework, a comparison is performed with the existing feature-based and deep image homography methods. Conclusion The proposed mosaicking framework outperformed existing methods and generated meaningful mosaic, while reducing the accumulated drift, even in the presence of visual challenges such as specular highlights, reflection, texture paucity, and low video resolution.
PurposeSmaller incisions and reduced surgical trauma made minimally invasive surgery (MIS) grow in popularity even though long training is required to master the instrument manipulation constraints. While numerous training systems have been developed in the past, very few of them tackled fetal surgery and more specifically the treatment of twin-twin transfusion syndrome (TTTS). To address this lack of training resources, this paper presents a novel mixed-reality surgical trainer equipped with comprehensive sensing for TTTS procedures. The proposed trainer combines the benefits of box trainer technology and virtual reality systems. Face and content validation studies are presented and a use-case highlights the benefits of having embedded sensors.MethodsFace and content validity of the developed setup was assessed by asking surgeons from the field of fetal MIS to accomplish specific tasks on the trainer. A small use-case investigates whether the trainer sensors are able to distinguish between an easy and difficult scenario.ResultsThe trainer was deemed sufficiently realistic and its proposed tasks relevant for practicing the required motor skills. The use-case demonstrated that the motion and force sensing capabilities of the trainer were able to analyze surgical skill.ConclusionThe developed trainer for fetal laser surgery was validated by surgeons from a specialized center in fetal medicine. Further similar investigations in other centers are of interest, as well as quality improvements which will allow to increase the difficulty of the trainer. The comprehensive sensing appeared to be capable of objectively assessing skill.
Twin-to-twin transfusion syndrome treatment requires fetoscopic laser photocoagulation of placental vascular anastomoses to regulate blood flow to both fetuses. Limited field-of-view (FoV) and low visual quality during fetoscopy make it challenging to identify all vascular connections. Mosaicking can align multiple overlapping images to generate an image with increased FoV, however, existing techniques apply poorly to fetoscopy due to the low visual quality, texture paucity, and hence fail in longer sequences due to the drift accumulated over time. Deep learning techniques can facilitate in overcoming these challenges. Therefore, we present a new generalized Deep Sequential Mosaicking (DSM) framework for fetoscopic videos captured from different settings such as simulation, phantom, and real environments. DSM extends an existing deep image-based homography model to sequential data by proposing controlled data augmentation and outlier rejection methods. Unlike existing methods, DSM can handle visual variations due to specular highlights and reflection across adjacent frames, hence reducing the accumulated drift. We perform experimental validation and comparison using 5 diverse fetoscopic videos to demonstrate the robustness of our framework.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.