Hierarchical X-Ray Report Generation via Pathology Tags and Multi Head Attention

Srinivasan, Preethi; Thapar, Daksh; Bhavsar, Arnav; Nigam, Aditya

doi:10.1007/978-3-030-69541-5_36

Cited by 18 publications

(19 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Fueled by recent progresses in the closely related computer vision problem of image-based captioning (Vinyals et al, 2015;Tran et al, 2020), there have been a number of research efforts in medical report generation in recent years (Jing et al, 2018(Jing et al, , 2019Li et al, 2018Xue et al, 2018;Yuan et al, 2019;Wang et al, 2018;Lovelace and Mortazavi, 2020;Srinivasan et al, 2020). These methods often perform reasonably well in addressing the language fluency aspect; on the other hand, as is also evidenced in our empirical evaluation, their results are notably less satisfactory in terms of clinical accuracy.…”

Section: Introductionmentioning

confidence: 76%

Automated Generation of Accurate \& Fluent Medical X-ray Reports

Nguyen¹,

Nie²,

Badamdorj³

et al. 2021

Preprint

View full text Add to dashboard Cite

Our paper focuses on automating the generation of medical reports from chest X-ray image inputs, a critical yet time-consuming task for radiologists. Unlike existing medical report generation efforts that tend to produce human-readable reports, we aim to generate medical reports that are both fluent and clinically accurate. This is achieved by our fully differentiable and end-to-end paradigm containing three complementary modules: taking the chest X-ray images and clinical history document of patients as inputs, our classification module produces an internal checklist of disease-related topics, referred to as enriched disease embedding; the embedding representation is then passed to our transformerbased generator, giving rise to the medical reports; meanwhile, our generator also produces the weighted embedding representation, which is fed to our interpreter to ensure consistency with respect to disease-related topics. Our approach achieved promising results on commonly-used metrics concerning language fluency and clinical accuracy. Moreover, noticeable performance gains are consistently observed when additional input information is available, such as the clinical document and extra scans of different views.

show abstract

Section: Introductionmentioning

confidence: 76%

Automated Generation of Accurate \& Fluent Medical X-ray Reports

Nguyen¹,

Nie²,

Badamdorj³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Dataset bias is a common problem in medical report generation as there are far more sentences describing normalities than abnormalities. To mitigate this bias, Srinivasan [348] propose a hierarchical classification approach using a transformer as a decoder. Specifically, the transformer decoder leverage attention between and across features obtained from reports, images, and tags for effective report generation.…”

Section: Dataset Biasmentioning

confidence: 99%

Transformers in Medical Imaging: A Survey

Shamshad¹,

Khan²,

Zamir³

et al. 2022

Preprint

View full text Add to dashboard Cite

Following unprecedented success on the natural language tasks, Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results and prompting researchers to reconsider the supremacy of convolutional neural networks (CNNs) as de facto operators. Capitalizing on these advances in computer vision, the medical imaging field has also witnessed growing interest for Transformers that can capture global context compared to CNNs with local receptive fields. Inspired from this transition, in this survey, we attempt to provide a comprehensive review of the applications of Transformers in medical imaging covering various aspects, ranging from recently proposed architectural designs to unsolved issues. Specifically, we survey the use of Transformers in medical image segmentation, detection, classification, reconstruction, synthesis, registration, clinical report generation, and other tasks. In particular, for each of these applications, we develop taxonomy, identify application-specific challenges as well as provide insights to solve them, and highlight recent trends. Further, we provide a critical discussion of the field's current state as a whole, including the identification of key challenges, open problems, and outlining promising future directions. We hope this survey will ignite further interest in the community and provide researchers with an up-to-date reference regarding applications of Transformer models in medical imaging. Finally, to cope with the rapid development in this field, we intend to regularly update the relevant latest papers and their open-source implementations at https://github.com/fahadshamshad/awesome-transformers-in-medical-imaging.

show abstract

“…Apart from some familiar topics such as disease detection (Oh et al, 2020;Luo et al, 2020;Lu et al, 2020b;Rajpurkar et al, 2017;Lu et al, 2020a;Ranjan et al, 2018) and lung segmentation (Eslami et al, 2020), the most related computer vision task is the emerging topic of image-based captioning, which aims at generating realistic sentences or topic-related paragraphs to summarize visual contents from images or videos (Vinyals et al, 2015;Xu et al, 2015;Goyal et al, 2017;Rennie et al, 2017;Huang et al, 2019;Feng et al, 2019;Pei et al, 2019;Tran et al, 2020). Not surprisingly, the recent progresses in medical report generation (Jing et al, 2018(Jing et al, , 2019Li et al, 2018Xue et al, 2018;Yuan et al, 2019;Wang et al, 2018;Lovelace and Mortazavi, 2020;Srinivasan et al, 2020;Zhang et al, 2020;Huang et al, 2021;Gasimova et al, 2020;Singh et al, 2019;Nishino et al, 2020) have been particularly influenced by the successes in image-based captioning. The work of (Vinyals et al, 2015;Xu et al, 2015) is among the early approaches in medical report generation, where visual features are extracted by convolution neural networks (CNNs); they are subsequently fed into recurrent neural networks (RNNs) to generate textual descriptions.…”

Section: Image-based Captioning and Medical Report Generationmentioning

confidence: 99%

“…It has been noted by (Jing et al, 2018(Jing et al, , 2019Li et al, 2018) that traditional RNNs are not well suited in generating long sentences and paragraphs (Vaswani et al, 2017;Krause et al, 2017), which renders them insufficient in medical report generation task (Jing et al, 2018). This issue is relieved by either conceiving hierarchical RNN architectures (Krause et al, 2017) (Jing et al, 2018(Jing et al, , 2019Li et al, 2018;Xue et al, 2018;Yuan et al, 2019;Wang et al, 2018;, or resorting to alternative techniques including in particular the recently developed transformer architectures (Vaswani et al, 2017) (Srinivasan et al, 2020;Lovelace and Mortazavi, 2020).…”

Section: Image-based Captioning and Medical Report Generationmentioning

confidence: 99%

See 1 more Smart Citation

Automated Generation of Accurate & Fluent Medical X-ray Reports

Nguyen¹,

Nie²,

Badamdorj³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Our paper aims to automate the generation of medical reports from chest X-ray image inputs, a critical yet time-consuming task for radiologists. Existing medical report generation efforts emphasize producing human-readable reports, yet the generated text may not be well aligned to the clinical facts. Our generated medical reports, on the other hand, are fluent and, more importantly, clinically accurate. This is achieved by our fully differentiable and end-to-end paradigm that contains three complementary modules: taking the chest X-ray images and clinical history document of patients as inputs, our classification module produces an internal checklist of disease-related topics, referred to as enriched disease embedding; the embedding representation is then passed to our transformer-based generator, to produce the medical report; meanwhile, our generator also creates a weighted embedding representation, which is fed to our interpreter to ensure consistency with respect to diseaserelated topics. Empirical evaluations demonstrate very promising results achieved by our approach on commonly-used metrics concerning language fluency and clinical accuracy. Moreover, noticeable performance gains are consistently observed when additional input information is available, such as the clinical document and extra scans from different views.* indicates equal contribution.

show abstract

Hierarchical X-Ray Report Generation via Pathology Tags and Multi Head Attention

Cited by 18 publications

References 24 publications

Automated Generation of Accurate \& Fluent Medical X-ray Reports

Automated Generation of Accurate \& Fluent Medical X-ray Reports

Transformers in Medical Imaging: A Survey

Automated Generation of Accurate & Fluent Medical X-ray Reports

Contact Info

Product

Resources

About