BACKGROUND
Negation and the speculation unrelated to abnormal findings can lead to false positive alarms for automatic radiology report highlighting or flagging by laboratory information systems.
OBJECTIVE
This internal validation study evaluates the performance of NLP methods (NegEx, NegBio, NegBERT, and Transformers).
METHODS
We annotated all negative and the speculative statements unrelated to abnormal findings in reports. In Experiment 1, we fine-tuned several Transformer models (ALBERT, BERT, DeBERTa, DistilBERT, ELECTRA, ERNIE, RoBERTa, SpanBERT, XLNet) and compared their performance using precision, recall, accuracy, and F1 scores. In Experiment 2, we compared the best model from Experiment 1 with three established negation and speculation detection algorithms (NegEx, NegBio, NegBERT).
RESULTS
Our study collected 6000 radiology reports from three branches of Chi Mei Hospital, covering multiple imaging modalities and body parts. 15.0% of words and 39.5% of important diagnostic keywords occurred in negative statements or speculative statements unrelated to abnormal findings.
In experiment 1, all models achieved accuracy > 98% and F1 score > 90% on the test dataset. ALBERT showed the best performance (accuracy 99.1%, F1 score 95.8%). In experiment 2, ALBERT outperformed the optimized NegEx, NegBio, and NegBERT methods overall (accuracy 99.6%, F1 score 99.1%) and in the prediction of whether diagnostic keywords occur in speculative statements unrelated to abnormal findings.
CONCLUSIONS
The ALBERT deep learning method showed the best performance. Our result represents a significant advance in the clinical application of computer-aided notification systems.
CLINICALTRIAL
Not applicable