2023
DOI: 10.1109/access.2022.3232719
|View full text |Cite
|
Sign up to set email alerts
|

Vision Transformer and Language Model Based Radiology Report Generation

Abstract: Recent advancements in transformers exploited computer vision problems which results in state-of-the-art models. Transformer-based models in various sequence prediction tasks such as language translation, sentiment classification, and caption generation have shown remarkable performance. Auto report generation scenarios in medical imaging through caption generation models is one of the applied scenarios for language models and have strong social impact. In these models, convolution neural networks have been us… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 16 publications
(2 citation statements)
references
References 25 publications
0
0
0
Order By: Relevance
“…The transformer encoder [35] consists of a stack of N transformer layers; each transformer layer consists of multi-head attention (MHA) layer [36] and feed-forward neural Using the different slopes of the upper-and lower-boundary lines of the main part of the aluminum profile fitted in the mask, the main part of the aluminum profile is rotated adaptively, and the angle of placement of the main part of the aluminum profile is adjusted so that it can be placed parallel to the upper and lower boundaries of the image and become more normalized.…”
Section: Transformer Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…The transformer encoder [35] consists of a stack of N transformer layers; each transformer layer consists of multi-head attention (MHA) layer [36] and feed-forward neural Using the different slopes of the upper-and lower-boundary lines of the main part of the aluminum profile fitted in the mask, the main part of the aluminum profile is rotated adaptively, and the angle of placement of the main part of the aluminum profile is adjusted so that it can be placed parallel to the upper and lower boundaries of the image and become more normalized.…”
Section: Transformer Modelmentioning
confidence: 99%
“…The transformer encoder [35] consists of a stack of N transformer layers; each transformer layer consists of multi-head attention (MHA) layer [36] and feed-forward neural network layer. The data output from each layer is then fused with the input data using residual connections, and the normalization is performed before input to the next layer.…”
Section: Transformer Modelmentioning
confidence: 99%