2022
DOI: 10.1007/978-3-031-04881-4_5
|View full text |Cite
|
Sign up to set email alerts
|

From Captions to Explanations: A Multimodal Transformer-based Architecture for Natural Language Explanation Generation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 16 publications
0
1
0
Order By: Relevance
“…In multi-label classification, the test radiological image is associated with multiple labels or concepts [20]- [30]. Most multi-label classification approaches, fine-tune a pre-trained CNN to predict if the images contain the given concepts present in the dataset [31]- [37]. Other approaches perform multi-label classification using deep features extracted from transformers [38]- [41].…”
Section: Related Workmentioning
confidence: 99%
“…In multi-label classification, the test radiological image is associated with multiple labels or concepts [20]- [30]. Most multi-label classification approaches, fine-tune a pre-trained CNN to predict if the images contain the given concepts present in the dataset [31]- [37]. Other approaches perform multi-label classification using deep features extracted from transformers [38]- [41].…”
Section: Related Workmentioning
confidence: 99%