Content-based processing and analysis of endoscopic images and videos: A survey

Münzer, Bernd; Schoeffmann, Klaus; Böszörményi, László

doi:10.1007/s11042-016-4219-z

Cited by 112 publications

(50 citation statements)

References 270 publications

(277 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…single frame, early fusion, late fusion, and slow fusion [6]. The importance of deep learning in medical image analysis and content-based processing and analysis of endoscopic images and video also is apparent from the work of Litjens et al [9] and Muenzer et al [12] respectively.…”

Section: Introductionmentioning

confidence: 99%

Learning laparoscopic video shot classification for gynecological surgery

Petscharnig

Schöffmann

2017

Multimed Tools Appl

View full text Add to dashboard Cite

Videos of endoscopic surgery are used for education of medical experts, analysis in medical research, and documentation for everyday clinical life. Hand-crafted image descriptors lack the capabilities of a semantic classification of surgical actions and video shots of anatomical structures. In this work, we investigate how well single-frame convolutional neural networks (CNN) for semantic shot classification in gynecologic surgery work. Together with medical experts, we manually annotate hours of raw endoscopic gynecologic surgery videos showing endometriosis treatment and myoma resection of over 100 patients. The cleaned ground truth dataset comprises 9 h of annotated video material (from 111 different recordings). We use the well-known CNN architectures AlexNet and GoogLeNet and train these architectures for both, surgical actions and anatomy, from scratch. Furthermore, we extract high-level features from AlexNet with weights from a pre-trained model from the Caffe model zoo and feed them to an SVM classifier. Our evaluation shows that we reach an average recall of .697 and .515 for classification of anatomical structures and surgical actions respectively using off-the-shelf CNN features. Using GoogLeNet, we achieve a mean recall of .782 and .617 for classification of anatomical structures and surgical actions respectively. With AlexNet the achieved recall is .615 for anatomical structures and .469 for surgical action classification respectively. The main conclusion of our work is that advances in general image classification methods transfer to the domain of endoscopic surgery videos in gynecology. This is relevant as this domain is different from natural images, e.g. it is distinguished by smoke, reflections, or a limited amount of colors.

show abstract

Section: Introductionmentioning

confidence: 99%

Learning laparoscopic video shot classification for gynecological surgery

Petscharnig

Schöffmann

2017

Multimed Tools Appl

View full text Add to dashboard Cite

show abstract

“…These works include (i) pre-processing of images such as image enhancement [14,41] and content filtering [2,36], (ii) real-time support at procedure time such as diagnostic decision support and computer-integrated surgery [44,45], as well as (iii) post-procedural applications such as quality/skills assessment [31,51] and contentbased retrieval [47,48]. A broad overview of such works is provided in an extensive survey by Muenzer et al [35]. While many works have been proposed for the first two categories mentioned above, content similarity search for supporting surgical quality assessment has been addressed only sparsely in the literature so far.…”

Section: Related Workmentioning

confidence: 99%

Video retrieval in laparoscopic video recordings with dynamic content descriptors

Schoeffmann

Husslein

Kletz

et al. 2017

Multimed Tools Appl

Self Cite

View full text Add to dashboard Cite

In the domain of gynecologic surgery an increasing number of surgeries are performed in a minimally invasive manner. These laparoscopic surgeries require specific psychomotor skills of the operating surgeon, which are difficult to learn and teach. This is the reason why an increasing number of surgeons promote checking video recordings of laparoscopic surgeries for the occurrence of technical errors with surgical actions. This manual surgical quality assessment (SQA) process, however, is very cumbersome and timeconsuming when carried out without any support from content-based video retrieval. Appl (2018) 77:16813-16832 Descriptor) that can be effectively used to find similar segments in a laparoscopic video database and thereby help surgeons to more quickly inspect other instances of a given error scene. We evaluate the retrieval performance of MIDD with surgical actions from gynecologic surgery in direct comparison to several other dynamic content descriptors. We show that the MIDD descriptor significantly outperforms the state-of-the-art in terms of retrieval performance as well as in terms of runtime performance. Additionally, we release the manually created video dataset of 16 classes of surgical actions from medical laparoscopy to the public, for further evaluations.

show abstract

“…Over the last decade, there is a growing research activity in the development of methods and systems that provide contextual information about surgical procedures, such as phase detection, instrument usage, event/task recognition, and skills performance. 1 Ultimately, the goal is to enhance the situation awareness of the operator, to improve surgical workflow, to minimize medical errors, and generally to optimize the management and quality of patient care. 2 The application of advanced data analysis algorithms, computing technologies, and imaging equipment in the operating room (OR) have played a key role towards this direction.…”

Section: Introductionmentioning

confidence: 99%

Multi‐instance multi‐label learning for surgical image annotation

Loukas

Sgouros

2019

Robotics Computer Surgery

View full text Add to dashboard Cite

Background Various techniques have been proposed in the literature for phase and tool recognition from laparoscopic videos. In comparison, research in multilabel annotation of still frames is limited. Methods We describe a framework for multilabel annotation of images extracted from laparoscopic cholecystectomy (LC) videos based on multi‐instance multiple‐label learning. The image is considered as a bag of features extracted from local regions after coarse segmentation. A method based on variational Bayesian gaussian mixture models (VBGMM) is proposed for bag representation. Three techniques based on different feature extraction and bag representation models are employed for comparison. Results Four anatomical structures (abdominal wall, gallbladder, fat, and liver bed) and a tool‐like object (specimen bag) were annotated in 482 images. Our method achieved the best performance on single label accuracy: 0.87 (highest) and 0.69 (lowest). Moreover, the performance was >20% higher in terms of four multilabel classification error metrics (one‐error, ranking‐loss, hamming‐loss, and coverage). Conclusions Our approach provides an accurate and efficient image representation for multilabel classification of still images captured in LC.

show abstract

Content-based processing and analysis of endoscopic images and videos: A survey

Cited by 112 publications

References 270 publications

Learning laparoscopic video shot classification for gynecological surgery

Learning laparoscopic video shot classification for gynecological surgery

Video retrieval in laparoscopic video recordings with dynamic content descriptors

Multi‐instance multi‐label learning for surgical image annotation

Contact Info

Product

Resources

About