Instrument detection, pose estimation, and tracking in surgical videos are an important vision component for computer-assisted interventions. While significant advances have been made in recent years, articulation detection is still a major challenge. In this paper, we propose a deep neural network for articulated multi-instrument 2-D pose estimation, which is trained on detailed annotations of endoscopic and microscopic data sets. Our model is formed by a fully convolutional detection-regression network. Joints and associations between joint pairs in our instrument model are located by the detection subnetwork and are subsequently refined through a regression subnetwork. Based on the output from the model, the poses of the instruments are inferred using maximum bipartite graph matching. Our estimation framework is powered by deep learning techniques without any direct kinematic information from a robot. Our framework is tested on single-instrument RMIT data, and also on multi-instrument EndoVis and in vivo data with promising results. In addition, the data set annotations are publicly released along with our code and model.
Detection of surgical instruments plays a key role in ensuring patient safety in minimally invasive surgery. In this paper, we present a novel method for 2D vision-based recognition and pose estimation of surgical instruments that generalizes to different surgical applications. At its core, we propose a novel scene model in order to simultaneously recognize multiple instruments as well as their parts. We use a Convolutional Neural Network architecture to embody our model and show that the cross-entropy loss is well suited to optimize its parameters which can be trained in an end-to-end fashion. An additional advantage of our approach is that instrument detection at test time is achieved while avoiding the need for scale-dependent sliding window evaluation. This allows our approach to be relatively parameter free at test time and shows good performance for both instrument detection and tracking. We show that our approach surpasses state-of-the-art results on in-vivo retinal microsurgery image data, as well as ex-vivo laparoscopic sequences.
In ophthalmology, retinal biological markers, or biomarkers, play a critical role in the management of chronic eye conditions and in the development of new therapeutics. While many imaging technologies used today can visualize these, Optical Coherence Tomography (OCT) is often the tool of choice due to its ability to image retinal structures in three dimensions at micrometer resolution. But with widespread use in clinical routine, and growing prevalence in chronic retinal conditions, the quantity of scans acquired worldwide is surpassing the capacity of retinal specialists to inspect these in meaningful ways. Instead, automated analysis of scans using machine learning algorithms provide a cost effective and reliable alternative to assist ophthalmologists in clinical routine and research. We present a machine learning method capable of consistently identifying a wide range of common retinal biomarkers from OCT scans. Our approach avoids the need for costly segmentation annotations and allows scans to be characterized by biomarker distributions. These can then be used to classify scans based on their underlying pathology in a device-independent way.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.