2022
DOI: 10.1007/978-3-031-16449-1_4
|View full text |Cite
|
Sign up to set email alerts
|

Surgical-VQA: Visual Question Answering in Surgical Scenes Using Transformer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(4 citation statements)
references
References 22 publications
0
4
0
Order By: Relevance
“…The model of (8) employs vision-text transformers and a residual MLP-based VisualBERT encoder to improve performance in classification-based answering. A multi-modal transformer-based architecture, VB-Fusion, is proposed by (9) to learn joint representations by combining modality-specific features with multi-modal transformer layers.…”
Section: Transformer Based Approaches For Visual Question Answeringmentioning
confidence: 99%
See 2 more Smart Citations
“…The model of (8) employs vision-text transformers and a residual MLP-based VisualBERT encoder to improve performance in classification-based answering. A multi-modal transformer-based architecture, VB-Fusion, is proposed by (9) to learn joint representations by combining modality-specific features with multi-modal transformer layers.…”
Section: Transformer Based Approaches For Visual Question Answeringmentioning
confidence: 99%
“…By using the ViT architecture, this approach enables the model to perform vision-based tasks with high performance, achieving notable success in many vision tasks because of its ability to capture the long-range dependencies and complex visual patterns through self-attention. The (43) introduces the Language-Vision GPT (LV-GPT) model, which incorporates vision input for VQA in surgery. It utilizes ViT, ResNet18 and Swin for feature extraction.…”
Section: Vision Transformers For Visual Question Answeringmentioning
confidence: 99%
See 1 more Smart Citation
“…It covers 12 diseases on 39 human parts. The medical datasets in 2022 are OVQA dataset [38], Diabetic Macular Edema (DME) dataset [28], EndoVis-18-VQA [39], and Cholec80-VQA [39].…”
Section: B Existing Vqa Datasetsmentioning
confidence: 99%