2024
DOI: 10.3390/diagnostics14070681
|View full text |Cite
|
Sign up to set email alerts
|

A Multimodal Transformer Model for Recognition of Images from Complex Laparoscopic Surgical Videos

Rahib H. Abiyev,
Mohamad Ziad Altabel,
Manal Darwish
et al.

Abstract: The determination of the potential role and advantages of artificial intelligence-based models in the field of surgery remains uncertain. This research marks an initial stride towards creating a multimodal model, inspired by the Video-Audio-Text Transformer, that aims to reduce negative occurrences and enhance patient safety. The model employs text and image embedding state-of-the-art models (ViT and BERT) to assess their efficacy in extracting the hidden and distinct features from the surgery video frames. Th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
references
References 27 publications
0
0
0
Order By: Relevance