BACKGROUND
Asthma-related symptoms are significant predictors of asthma exacerbation. Most of these symptoms are documented in clinical notes in free text format. Methods that can effectively capture the asthma-related symptoms from the unstructured data are lacking.
OBJECTIVE
The study aims to develop a natural language process (NLP) algorithm and process to identify symptoms associated with asthma from clinical notes within a large integrated healthcare system.
METHODS
We used unstructured data within two years prior to asthma diagnosis visits in 2013-2018 and 2021-2022 to identify four common asthma-related symptoms. Related terms and phrases were first compiled from publicly available resources and then recursively reviewed and enriched with inputs from clinicians and chart review. A rule-based NLP algorithm was first iteratively developed and refined via multiple rounds of chart review followed by adjudication, and then transformer-based deep learning algorithms were developed and validated using the same manually annotated datasets. Subsequently, a hybrid algorithm was generated by combining the rule-based and the transformer-based algorithms. Finally, the developed algorithms were implemented in all the study notes.
RESULTS
A total of 11,374,552 eligible study clinical notes with 128,211,793 sentences were retrieved. At least one symptom was identified in 1,663,450 (1.30%) sentences and 858,350 (7.55%) notes, respectively. Cough had the highest frequency at both sentence (1.07%) and note (5.81%) levels while chest tightness had the lowest one at both sentence (0.11%) and note (0.57%) levels. The frequencies of concomitant symptoms ranged from 0.03% to 0.38% at the sentence level and 0.10% to 1.85% at the note level. The validation of the hybrid algorithm against the annotated result of 1,600 clinical notes yielded a positive predictive value ranging from 96.53% (wheezing) to 97.42% (chest tightness) at the sentence level and 96.76% (wheezing) to 97.42% (chest tightness) at the note level, sensitivity ranged from 93.90% (dyspnea) to 95.95% (cough) at the sentence level and 96.00% (chest tightness) to 99.07% (cough) at the note level. The corresponding F1 scores of all four symptoms were > 0.95 at both sentence and note levels regardless of NLP algorithms.
CONCLUSIONS
The developed NLP algorithms could effectively capture asthma-related symptoms from unstructured notes. These algorithms could be utilized to examine asthma burden and prediction of asthma exacerbation.