2022
DOI: 10.1186/s12880-022-00800-x
|View full text |Cite
|
Sign up to set email alerts
|

BPI-MVQA: a bi-branch model for medical visual question answering

Abstract: Background Visual question answering in medical domain (VQA-Med) exhibits great potential for enhancing confidence in diagnosing diseases and helping patients better understand their medical conditions. One of the challenges in VQA-Med is how to better understand and combine the semantic features of medical images (e.g., X-rays, Magnetic Resonance Imaging(MRI)) and answer the corresponding questions accurately in unlabeled medical datasets. Method … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(8 citation statements)
references
References 43 publications
(47 reference statements)
0
8
0
Order By: Relevance
“…Then, these models were transferred into a lightweight student model for fine-tuning radiological images of VQA-Med datasets. Another study [28] proposed a bi-branched model based on parallel networks and image retrieval for VQA-Med (BPI-MVQA) to realize complementary advantages in image sequence feature extraction, spatial feature extraction, and multi-modal fusion, forcing the VQA-Med to consider the feature applicability to specific image-understanding tasks [29].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Then, these models were transferred into a lightweight student model for fine-tuning radiological images of VQA-Med datasets. Another study [28] proposed a bi-branched model based on parallel networks and image retrieval for VQA-Med (BPI-MVQA) to realize complementary advantages in image sequence feature extraction, spatial feature extraction, and multi-modal fusion, forcing the VQA-Med to consider the feature applicability to specific image-understanding tasks [29].…”
Section: Related Workmentioning
confidence: 99%
“…We compare our CCIS-MVQA model with some SOTA methods: MEVF [25], CPRD [27], BPI-MVQA [28], CGMVQA [30], QC-MLB [32], QFPN [35], Caption-Aware [36], AOM [39], and Optimal Model [50]. We have reviewed these models in Section II.…”
Section: Evaluation Of the Overall Performancementioning
confidence: 99%
“…The model predicts the answer either by a classification or a generation head depending on the type of question. Finally, in [ 50 ], a bibranched model is proposed in which the first branch answers closed-ended questions with a transformer architecture, and the second branch answers open-ended questions with image retrieval that gives the most similar answer to the test image.…”
Section: Related Workmentioning
confidence: 99%
“…A few of the difficulties that VQA-Med encounters include the requirement for special processing of medical-specific vocabulary in medical texts and images, a challenge in combining multi-modal features at various levels of medical texts and images, and a propensity to ignore the relationship between the question and the visual information deduced from the text semantics. The VQA model was presented by [61] with two branches. The model uses a transformer structure for the common classification problem, three embedding methods, a hierarchy of feature extractors, a parallel structure of GRU and ResNet152 as image feature extractors, and specialized segmentation symbols as input.…”
Section: Wordnetmentioning
confidence: 99%