Labelling Imaging Datasets on the Basis of Neuroradiology Reports: A Validation Study

Wood, David A.; Kafiabadi, Sina; Busaidi, Aisha Al; Guilhem, Emily; Lynch, Jeremy; Townend, Matthew; Montvila, Antanas; Siddiqui, Juveria; Gadapa, Naveen; Benger, Matthew; Barker, Gareth J.; Ourselin, Sébastien; Cole, James H.; Booth, Thomas C.

doi:10.1007/978-3-030-61166-8_27

Cited by 10 publications

(17 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Previous studies have only reported model performance on a hold-out set of labelled reports [6,7,9], and to date, there has been no investigation into the general validity of NLP-derived labels for head MRI examinations [36]. An important question Table 2 Reference-standard report labels across all abnormality categories.…”

Section: Nlp Modellingmentioning

confidence: 99%

Deep learning to automate the labelling of head MRI datasets for computer vision applications

et al. 2021

Self Cite

View full text Add to dashboard Cite

Objectives The purpose of this study was to build a deep learning model to derive labels from neuroradiology reports and assign these to the corresponding examinations, overcoming a bottleneck to computer vision model development. Methods Reference-standard labels were generated by a team of neuroradiologists for model training and evaluation. Three thousand examinations were labelled for the presence or absence of any abnormality by manually scrutinising the corresponding radiology reports (‘reference-standard report labels’); a subset of these examinations (n = 250) were assigned ‘reference-standard image labels’ by interrogating the actual images. Separately, 2000 reports were labelled for the presence or absence of 7 specialised categories of abnormality (acute stroke, mass, atrophy, vascular abnormality, small vessel disease, white matter inflammation, encephalomalacia), with a subset of these examinations (n = 700) also assigned reference-standard image labels. A deep learning model was trained using labelled reports and validated in two ways: comparing predicted labels to (i) reference-standard report labels and (ii) reference-standard image labels. The area under the receiver operating characteristic curve (AUC-ROC) was used to quantify model performance. Accuracy, sensitivity, specificity, and F1 score were also calculated. Results Accurate classification (AUC-ROC > 0.95) was achieved for all categories when tested against reference-standard report labels. A drop in performance (ΔAUC-ROC > 0.02) was seen for three categories (atrophy, encephalomalacia, vascular) when tested against reference-standard image labels, highlighting discrepancies in the original reports. Once trained, the model assigned labels to 121,556 examinations in under 30 min. Conclusions Our model accurately classifies head MRI examinations, enabling automated dataset labelling for downstream computer vision applications. Key Points • Deep learning is poised to revolutionise image recognition tasks in radiology; however, a barrier to clinical adoption is the difficulty of obtaining large labelled datasets for model training. • We demonstrate a deep learning model which can derive labels from neuroradiology reports and assign these to the corresponding examinations at scale, facilitating the development of downstream computer vision models. • We rigorously tested our model by comparing labels predicted on the basis of neuroradiology reports with two sets of reference-standard labels: (1) labels derived by manually scrutinising each radiology report and (2) labels derived by interrogating the actual images.

show abstract

Section: Nlp Modellingmentioning

confidence: 99%

Deep learning to automate the labelling of head MRI datasets for computer vision applications

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…Creating a manual annotation protocol is difficult [6] and the protocol constantly evolves as new data are encountered and labelled. It is therefore useful to be able to encode certain phrases/rules from the protocol in a template so that they can be learned by the model.…”

Section: Protocol-derived Templatesmentioning

confidence: 99%

“…However, extracting labels from text can be challenging because the language in radiology reports is diverse, domain-specific, and often difficult to interpret. Therefore, the task of reading the radiology report and assigning labels is not trivial and requires a certain degree of medical knowledge on the part of a human annotator [6]. When we rely on pure data-driven learning, we find that the model sometimes fails to learn critical features or learns the correct answer via simple heuristics (e.g., that presence of the word "likely" indicates positivity) rather than valid reasoning, and thus fails to generalise to rarer cases which have not been learned or where the heuristics break down (e.g., "likely represents prominent VR space or lacunar infarct" which indicates uncertainty over two differential diagnoses).…”

Section: Introductionmentioning

confidence: 99%

Templated Text Synthesis for Expert-Guided Multi-Label Extraction from Radiology Reports

Schrempf

Watson

Park

et al. 2021

MAKE

View full text Add to dashboard Cite

Training medical image analysis models traditionally requires large amounts of expertly annotated imaging data which is time-consuming and expensive to obtain. One solution is to automatically extract scan-level labels from radiology reports. Previously, we showed that, by extending BERT with a per-label attention mechanism, we can train a single model to perform automatic extraction of many labels in parallel. However, if we rely on pure data-driven learning, the model sometimes fails to learn critical features or learns the correct answer via simplistic heuristics (e.g., that “likely” indicates positivity), and thus fails to generalise to rarer cases which have not been learned or where the heuristics break down (e.g., “likely represents prominent VR space or lacunar infarct” which indicates uncertainty over two differential diagnoses). In this work, we propose template creation for data synthesis, which enables us to inject expert knowledge about unseen entities from medical ontologies, and to teach the model rules on how to label difficult cases, by producing relevant training examples. Using this technique alongside domain-specific pre-training for our underlying BERT architecture i.e., PubMedBERT, we improve F1 micro from 0.903 to 0.939 and F1 macro from 0.512 to 0.737 on an independent test set for 33 labels in head CT reports for stroke patients. Our methodology offers a practical way to combine domain knowledge with machine learning for text classification tasks.

show abstract

“…Generating these labels can be performed manually (each report is read by a human rater who assigns diagnostic codes or other labels) or automatically (a rules-based or machine-learningbased NLP technique automatically assigns labels to reports). The latter techniques usually rely on a subset of reports which have been annotated manually to allow supervised learning [4][5][6]. Labels generated manually are often referred to as the "ground truth".…”

Section: Background and Significancementioning

confidence: 99%

ToKSA - Tokenized Key Sentence Annotation - a Novel Method for Rapid Approximation of Ground Truth for Natural Language Processing

Fairfield

Cambridge

Cullen

et al. 2021

Preprint

View full text Add to dashboard Cite

Objective Identifying phenotypes and pathology from free text is an essential task for clinical work and research. Natural language processing (NLP) is a key tool for processing free text at scale. Developing and validating NLP models requires labelled data. Labels are generated through time-consuming and repetitive manual annotation and are hard to obtain for sensitive clinical data. The objective of this paper is to describe a novel approach for annotating radiology reports. Materials and Methods We implemented tokenized key sentence-specific annotation (ToKSA) for annotating clinical data. We demonstrate ToKSA using 180,050 abdominal ultrasound reports with labels generated for symptom status, gallstone status and cholecystectomy status. Firstly, individual sentences are grouped together into a term-frequency matrix. Annotation of key (i.e. the most frequently occurring) sentences is then used to generate labels for multiple reports simultaneously. We compared ToKSA-derived labels to those generated by annotating full reports. We used ToKSA-derived labels to train a document classifier using convolutional neural networks. We compared performance of the classifier to a separate classifier trained on labels based on the full reports. Results By annotating only 2,000 frequent sentences, we were able to generate labels for symptom status for 70,000 reports (accuracy 98.4%), gallstone status for 85,177 reports (accuracy 99.2%) and cholecystectomy status for 85,177 reports (accuracy 100%). The accuracy of the document classifier trained on ToKSA labels was similar (0.1-1.1% more accurate) to the document classifier trained on full report labels. Conclusion ToKSA offers an accurate and efficient method for annotating free text clinical data.

show abstract

Labelling Imaging Datasets on the Basis of Neuroradiology Reports: A Validation Study

Cited by 10 publications

References 10 publications

Deep learning to automate the labelling of head MRI datasets for computer vision applications

Deep learning to automate the labelling of head MRI datasets for computer vision applications

Templated Text Synthesis for Expert-Guided Multi-Label Extraction from Radiology Reports

ToKSA - Tokenized Key Sentence Annotation - a Novel Method for Rapid Approximation of Ground Truth for Natural Language Processing

Contact Info

Product

Resources

About