Classifying the lifestyle status for Alzheimer’s disease from clinical notes using deep learning with weak supervision

Shen, Zitao; Schutte, Dalton; Yi, Yoonkwon; Bompelli, Anusha; Yu, Fang; Wang, Yanshan; Zhang, Rui

doi:10.1186/s12911-022-01819-4

Cited by 17 publications

(7 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Artificial intelligence models that use word and paragraph embedding are likely to perform better on these kinds of tasks. 5,[28][29][30] Future work could therefore focus on the use of these types of dedicated models for classifying papers on the Three Rs. Given that we have an end-to-end framework, in which training and application of the model for the user are combined in a single platform selection, new models from the fast-evolving field of language-based AI models can be rapidly deployed for use.…”

Section: Discussionmentioning

confidence: 99%

The 3Ranker: An AI-based Algorithm for Finding Non-animal Alternative Methods

van Beuningen,

Alkema,

Hijlkema

et al. 2023

Altern Lab Anim

View full text Add to dashboard Cite

The search for existing non-animal alternative methods for use in experiments is currently challenging because of the lack of both comprehensive structured databases and balanced keyword-based search strategies to mine unstructured textual databases. In this paper we describe 3Ranker, which is a fast, keyword-independent algorithm for finding non-animal alternative methods for use in biomedical research. The 3Ranker algorithm was created by using a machine learning approach, consisting of a Random Forest model built on a dataset of 35 million abstracts and constructed with weak supervision, followed by iterative model improvement with expert curated data. We found a satisfactory trade-off between sensitivity and specificity, with Area Under the Curve (AUC) values ranging from 0.85–0.95. Trials showed that the AI-based classifier was able to identify articles that describe potential alternatives to animal use, among the thousands of articles returned by generic PubMed queries on dermatitis and Parkinson’s disease. Application of the classification models on time series data showed the earlier implementation and acceptance of Three Rs principles in the area of cosmetics and skin research, as compared to the area of neurodegenerative disease research. The 3Ranker algorithm is freely available at www.open3r.org ; the future goal is to expand this framework to cover multiple research domains and to enable its broad use by researchers, policymakers, funders and ethical review boards, in order to promote the replacement of animal use in research wherever possible.

show abstract

Section: Discussionmentioning

confidence: 99%

The 3Ranker: An AI-based Algorithm for Finding Non-animal Alternative Methods

van Beuningen,

Alkema,

Hijlkema

et al. 2023

Altern Lab Anim

View full text Add to dashboard Cite

show abstract

“…However, the lack of labelled data is a major bottleneck to applying such technique for disease information extraction (40). Recently DL-based NLP models trained with rule-based method labelled data (weak supervision learning) has been successfully applied into medical free-text mining tasks such as hip fracture classification (38) and Alzheimer’s disease risk factor characterisation(41), achieving competitive performance compared to models trained on human-annotated data. Promisingly, our proposed rule-based NLP method provides a foundation for application of these techniques to identify HCC from free-text imaging reports.…”

Section: Discussionmentioning

confidence: 99%

Identifying Hepatocellular Carcinoma from imaging reports using natural language processing to facilitate data extraction from electronic patient records

Wang

Glampson

Mercuri

et al. 2022

Preprint

View full text Add to dashboard Cite

Background: The National Institute for Health Research Health Informatics Collaborative (NIHR HIC) viral hepatitis theme is working to overcome governance and data challenges to collate routine clinical data from electronic patients records from multiple UK hospital sites for translational research. The development of hepatocellular carcinoma (HCC) is a critical outcome for patients with viral hepatitis with the drivers of cancer transformation poorly understood. Objective: This study aims to develop a natural language processing (NLP) algorithm for automatic HCC identification from imaging reports to facilitate studies into HCC. Methods: 1140 imaging reports were retrieved from the NIHR HIC viral hepatitis research database v1.0. These reports were from two sites, one used for method development (site 1) and the other for validation (site 2). Reports were initially manually annotated as binary classes (HCC vs. non-HCC). We designed inference rules for recognising HCC presence, wherein medical terms for eligibility criteria of HCC were determined by domain experts. A rule-based NLP algorithm with five submodules (regular expressions of medical terms, terms recognition, negation detection, sentence tagging, and report label generation) was developed and iteratively tuned. Results: Our rule-based algorithm achieves an accuracy of 99.85% (sensitivity: 90%, specificity: 100%) for identifying HCC on the development set and 99.59% (sensitivity: 100%, specificity: 99.58%) on the validation set. This method outperforms several off-the-shelf models on HCC identification including 'machine learning based' and 'deep learning based' text classifiers in achieving significantly higher sensitivity. Conclusion: Our rule-based NLP method gives high sensitivity and high specificity for HCC identification, even from imbalanced datasets with a small number positive cases, and can be used to rapidly screen imaging reports, at large-scale to facilitate epidemiological and clinical studies into HCC.

show abstract

“…In clinical NLP, studies use lexical or concept filtering rules to create labelled data to extract nuanced categories (e.g. suicidal ideation [ 28 ] or lifestyle factors for Alzheimer’s Disease [ 29 ]) from clinical texts. We extend over this line of research by using ontologies and a medical concept labelling tool with two specific rules to create reliable weak data to extract rare diseases.…”

Section: Background and Related Workmentioning

confidence: 99%

“…The second aspect is data representation , representing the contexts and semantics in the data into vectors in a high-dimensional space for subsequent steps in machine learning. For deep learning methods, previous studies [ 13 , 29 ] proposed to use neural word embeddings and more recently using BERT [ 30 ] to represent the contexts of the textual data. We follow this direction to apply weak supervision with contextual representations for rare disease phenotyping.…”

Section: Background and Related Workmentioning

confidence: 99%

Ontology-driven and weakly supervised rare disease identification from clinical notes

Dong

Suárez-Paniagua

Zhang

et al. 2023

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Background Computational text phenotyping is the practice of identifying patients with certain disorders and traits from clinical notes. Rare diseases are challenging to be identified due to few cases available for machine learning and the need for data annotation from domain experts. Methods We propose a method using ontologies and weak supervision, with recent pre-trained contextual representations from Bi-directional Transformers (e.g. BERT). The ontology-driven framework includes two steps: (i) Text-to-UMLS, extracting phenotypes by contextually linking mentions to concepts in Unified Medical Language System (UMLS), with a Named Entity Recognition and Linking (NER+L) tool, SemEHR, and weak supervision with customised rules and contextual mention representation; (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). The weakly supervised approach is proposed to learn a phenotype confirmation model to improve Text-to-UMLS linking, without annotated data from domain experts. We evaluated the approach on three clinical datasets, MIMIC-III discharge summaries, MIMIC-III radiology reports, and NHS Tayside brain imaging reports from two institutions in the US and the UK, with annotations. Results The improvements in the precision were pronounced (by over 30% to 50% absolute score for Text-to-UMLS linking), with almost no loss of recall compared to the existing NER+L tool, SemEHR. Results on radiology reports from MIMIC-III and NHS Tayside were consistent with the discharge summaries. The overall pipeline processing clinical notes can extract rare disease cases, mostly uncaptured in structured data (manually assigned ICD codes). Conclusion The study provides empirical evidence for the task by applying a weakly supervised NLP pipeline on clinical notes. The proposed weak supervised deep learning approach requires no human annotation except for validation and testing, by leveraging ontologies, NER+L tools, and contextual representations. The study also demonstrates that Natural Language Processing (NLP) can complement traditional ICD-based approaches to better estimate rare diseases in clinical notes. We discuss the usefulness and limitations of the weak supervision approach and propose directions for future studies.

show abstract

Classifying the lifestyle status for Alzheimer’s disease from clinical notes using deep learning with weak supervision

Cited by 17 publications

References 22 publications

The 3Ranker: An AI-based Algorithm for Finding Non-animal Alternative Methods

The 3Ranker: An AI-based Algorithm for Finding Non-animal Alternative Methods

Identifying Hepatocellular Carcinoma from imaging reports using natural language processing to facilitate data extraction from electronic patient records

Ontology-driven and weakly supervised rare disease identification from clinical notes

Contact Info

Product

Resources

About