Natural language processing for the assessment of cardiovascular disease comorbidities: The <scp>cardio‐Canary</scp> comorbidity project

Berman, Aaron D.; Biery, David; Ginder, Curtis; Hulme, Olivia L.; Marcusa, Daniel P.; Leiva, Orly; Wu, Winona; Cardin, Nicholas; Hainer, Jon; Bhatt, Deepak L.; Carli, Marcelo F. Di; Turchin, Alexander; Blankstein, Ron

doi:10.1002/clc.23687

Cited by 20 publications

(13 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performances of our main NLP-ML-CLINICAL pipeline are comparable to the best published results regarding similar approaches, although slightly lower than those reached by Singh et al 13 , 15 , 37 We emphasize that these studies remain difficult to compare as they focus on different languages (French vs English) and texts (eg, Singh et al consider only medical and surgical history sections). Aggregating extractions from the entity level to the stay level significantly improves performances, which is a known and notable result as stay-level or patient-level features are often of higher importance and interest than entity-level features, eg, in epidemiological studies.…”

Section: Discussionsupporting

confidence: 66%

“…Aggregating extractions from the entity level to the stay level significantly improves performances, which is a known and notable result as stay-level or patient-level features are often of higher importance and interest than entity-level features, eg, in epidemiological studies. 15 Notably, sensitivity is greatly improved, as the aggregation step allows for missed entities of a specific condition to be compensated by other occurrences of the same condition. Similarly, conditions with 2 levels of severity benefit from this aggregation since severity would not necessarily be mentioned on each entity.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions

Petit-Jean,

Gérardin,

Berthelot

et al. 2024

Journal of the American Medical Informatics Association

View full text Add to dashboard Cite

Objective To develop and validate a natural language processing (NLP) pipeline that detects 18 conditions in French clinical notes, including 16 comorbidities of the Charlson index, while exploring a collaborative and privacy-enhancing workflow. Materials and Methods The detection pipeline relied both on rule-based and machine learning algorithms, respectively, for named entity recognition and entity qualification, respectively. We used a large language model pre-trained on millions of clinical notes along with annotated clinical notes in the context of 3 cohort studies related to oncology, cardiology, and rheumatology. The overall workflow was conceived to foster collaboration between studies while respecting the privacy constraints of the data warehouse. We estimated the added values of the advanced technologies and of the collaborative setting. Results The pipeline reached macro-averaged F1-score positive predictive value, sensitivity, and specificity of 95.7 (95%CI 94.5-96.3), 95.4 (95%CI 94.0-96.3), 96.0 (95%CI 94.0-96.7), and 99.2 (95%CI 99.0-99.4), respectively. F1-scores were superior to those observed using alternative technologies or non-collaborative settings. The models were shared through a secured registry. Conclusions We demonstrated that a community of investigators working on a common clinical data warehouse could efficiently and securely collaborate to develop, validate and use sensitive artificial intelligence models. In particular, we provided an efficient and robust NLP pipeline that detects conditions mentioned in clinical notes.

show abstract

Section: Discussionsupporting

confidence: 66%

Section: Discussionmentioning

confidence: 99%

Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions

Petit-Jean,

Gérardin,

Berthelot

et al. 2024

Journal of the American Medical Informatics Association

View full text Add to dashboard Cite

show abstract

“…The performances of our main NLP-ML-CLINICAL pipeline are comparable to the best published results regarding similar algorithms although slightly lower than those reached by Singh et al . [13,15,36] We emphasize that these studies remain difficult to compare as they focus on different languages (French vs. English) and texts (e.g., Singh et al . consider only medical and surgical history sections).…”

Section: Discussionmentioning

confidence: 99%

“…Nevertheless, it was shown that information could be efficiently obtained from clinical notes using Natural Language Processing (NLP) algorithms instead, those algorithms relying even more on machine learning (ML) techniques such as language models. [12][13][14][15][16][17][18][19] Developing tools to this end remains challenging, and many difficulties are yet to be overcome for a wide community to benefit from them. [4,7,[19][20][21][22][23][24] First, the optimal NLP technologies are still debated.…”

Section: Introductionmentioning

confidence: 99%

Collaborative and privacy-preserving workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions

Petit-Jean,

Gérardin,

Berthelot

et al. 2023

Preprint

View full text Add to dashboard Cite

ObjectiveTo develop and validate advanced natural language processing pipelines that detect 18 conditions in clinical notes written in French, among which 16 comorbidities of the Charlson index, while exploring a collaborative and privacy-preserving workflow.Materials and methodsThe detection pipelines relied both on rule-based and machine learning algorithms for named entity recognition and entity qualification, respectively. We used a large language model pre-trained on millions of clinical notes along with clinical notes annotated in the context of three cohort studies related to oncology, cardiology and rheumatology, respectively. The overall workflow was conceived to foster collaboration between studies while complying to the privacy constraints of the data warehouse. We estimated the added values of both the advanced technologies and the collaborative setting.ResultsThe 18 pipelines reached macro-averaged F1-score positive predictive value, sensitivity and specificity of 95.7 (95%CI 94.5 - 96.3), 95.4 (95%CI 94.0 - 96.3), 96.0 (95%CI 94.0 - 96.7) and 99.2 (95%CI 99.0 - 99.4), respectively. F1-scores were superior to those observed using either alternative technologies or non-collaborative settings. The models were shared through a secured registry.ConclusionsWe demonstrated that a community of investigators working on a common clinical data warehouse could efficiently and securely collaborate to develop, validate and use sensitive artificial intelligence models. In particular, we provided efficient and robust natural language processing pipelines that detect conditions mentioned in clinical notes.

show abstract

“…Such techniques offer promising solutions for aiding in the often still manual and labour-intensive process of ICD coding, and for correcting ICD-code related errors. 6 …”

Section: Introductionmentioning

confidence: 99%

Using natural language processing for automated classification of disease and to identify misclassified ICD codes in cardiac disease

Falter,

Godderis,

Scherrenberg

et al. 2024

European Heart Journal - Digital Health

View full text Add to dashboard Cite

Introduction ICD-codes are used for classification of hospitalisations. The codes are used for administrative, financial and research purposes. It is known however that errors occur. Natural language processing (NLP) offers promising solutions for optimising the process. Objectives To investigate methods for automatic classification of disease in unstructured medical records using NLP and to compare these to conventional ICD coding. Methods Two datasets were used: the open-source MIMIC-III dataset (n = 55.177) and a dataset from a hospital in Belgium (n = 12.706). Automated searches using NLP algorithms were performed for the diagnoses “atrial fibrillation” and “heart failure”. Four methods were used: rule-based search, logistic regression, term frequency-inverse document frequency (TF-IDF), XGBoost and BioBERT. All algorithms were developed on the MIMIC-III dataset. The best performing algorithm was then deployed on the Belgian dataset. Results After pre-processing a total of 1.438 reports was retained in the Belgian dataset. XGBoost on TF-IDF matrix resulted in an accuracy of 0.94 and 0.92 for AF and HF respectively. There were 211 mismatches between algorithm and ICD codes. 103 were due to a difference in data availability or differing definitions. In the remaining 108 mismatches, 70% were due to incorrect labelling by the algorithm and 30% were due to erroneous ICD-coding (2% of total hospitalisations). Discussion and conclusion A newly developed NLP algorithm attained a high accuracy for classifying disease in medical records. XGBoost outperformed the deep learning technique BioBERT. NLP algorithms could be used to identify ICD-coding errors and optimise and support the ICD-coding process.

show abstract

Natural language processing for the assessment of cardiovascular disease comorbidities: The cardio‐Canary comorbidity project

Cited by 20 publications

References 28 publications

Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions

Collaborative and privacy-enhancing workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions

Collaborative and privacy-preserving workflows on a clinical data warehouse: an example developing natural language processing pipelines to detect medical conditions

Using natural language processing for automated classification of disease and to identify misclassified ICD codes in cardiac disease

Contact Info

Product

Resources

About