Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes

Palmer, Ellen L.; Hassanpour, Saeed; Higgins, John; Doherty, Jennifer A.; Onega, Tracy

doi:10.1186/s12911-019-0863-3

Cited by 16 publications

(12 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Artificial intelligence and deep learning applications are making their way into clinical practice, but their efficacy is not yet proven in prospective trial settings. 16 The current study utilized more advanced deep learning compared to previous reports 18 , 19 , 20 in a retrospective proof-of-principle setting, where we successfully extracted a large amount of smoking data in a matter of days. Two language models were tested with good specificity (88%-98%), comparable to the previous results in English language.…”

Section: Discussionmentioning

confidence: 99%

“…24 The learned knowledge was then transferred to a training classifier, by randomly picking 5000 tobacco smoking-related sample phrases and sentences from the medical narrative archive of our hospital, using the Finnish word-stem ‘tupak’ equivalent to the English word-stem ‘smok’. 19 These sample phrases were manually labeled into three classes (never, former, or current smoker). ULMFiT- and BERT-based classification models were then trained on this data to produce smoking phrase classifiers.…”

Section: Methodsmentioning

confidence: 99%

“…SPSS version 26 (IBM, Armonk, NY) was used. Sensitivity and specificity analyses were calculated with 2 × 2 contingency tables separately for never, former, and persistent smokers, excluding patients with missing smoking status, 19 using the Python ‘sklearn’ package.…”

Section: Methodsmentioning

confidence: 99%

“… 16 , 17 This includes universal language modeling of medical narratives, wherein real-life practice health data, such as smoking status, are often presented in an unstructured format. 18 First reports of using language modeling to define an individual's smoking status appeared encouraging both from electronic health records (EHRs) 18 , 19 and from user-generated content on a smoking cessation support website. 20 …”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Impact of deep learning-determined smoking status on mortality of cancer patients: never too late to quit

et al. 2021

View full text Add to dashboard Cite

Background Persistent smoking after cancer diagnosis is associated with increased overall mortality (OM) and cancer mortality (CM). According to the 2020 Surgeon General's report, smoking cessation may reduce CM but supporting evidence is not wide. Use of deep learning-based modeling that enables universal natural language processing of medical narratives to acquire population-based real-life smoking data may help overcome the challenge. We assessed the effect of smoking status and within-1-year smoking cessation on CM by an in-house adapted freely available language processing algorithm. Materials and methods This cross-sectional real-world study included 29 823 patients diagnosed with cancer in 2009-2018 in Southwest Finland. The medical narrative, International Classification of Diseases-10th edition codes, histology, cancer treatment records, and death certificates were combined. Over 162 000 sentences describing tobacco smoking behavior were analyzed with ULMFiT and BERT algorithms. Results The language model classified the smoking status of 23 031 patients. Recent quitters had reduced CM [hazard ratio (HR) 0.80 (0.74-0.87)] and OM [HR 0.78 (0.72-0.84)] compared to persistent smokers. Compared to never smokers, persistent smokers had increased CM in head and neck, gastro-esophageal, pancreatic, lung, prostate, and breast cancer and Hodgkin's lymphoma, irrespective of age, comorbidities, performance status, or presence of metastatic disease. Increased CM was also observed in smokers with colorectal cancer, men with melanoma or bladder cancer, and lymphoid and myeloid leukemia, but no longer independently of the abovementioned covariates. Specificity and sensitivity were 96%/96%, 98%/68%, and 88%/99% for never, former, and current smokers, respectively, being essentially the same with both models. Conclusions Deep learning can be used to classify large amounts of smoking data from the medical narrative with good accuracy. The results highlight the detrimental effects of persistent smoking in oncologic patients and emphasize that smoking cessation should always be an essential element of patient counseling.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Impact of deep learning-determined smoking status on mortality of cancer patients: never too late to quit

et al. 2021

View full text Add to dashboard Cite

show abstract

“…OSCG were subjected to biopsy to obtain diagnostic certainty. The data on tobacco consumption was made according the number of pack year smoked; a subject that smoked fifteen or more pack years in twenty years was considered a smoker (13). One alcohol unit a day (one drink) was considered as regular alcohol consumption (14).…”

Section: Methodsmentioning

confidence: 99%

Oral Human Papillomavirus: a multisite infection

Criscuolo¹,

Morelatto²,

Belardinelli³

et al. 2020

Med Oral

View full text Add to dashboard Cite

Background: The Human Papillomavirus (HPV) has different strategies for persist in the cells. This characteristic has led us to consider the presence of the virus in tissues of the oral cavity that had no clinical signs of infection. The aim of this study was to detect the presence of DNA-HPV at multiple sites of the oral cavity. Material and Methods: A case-control study was designed: Oral Squamous Carcinoma Group (OSCG), healthy n=72 and Control Group (CG), n=72, healthy volunteers paired by sex and age with OSCG. Four samples were taken from OSCG: saliva, biopsy, brush scraping of lesion and contralateral healthy side. In CG a saliva sample and a scratch of the posterior border of tongue were collected. HPV was detected by PCR using Bioneer Accuprep genomic DNA Extraction kit, and consensus primers MY09 and MY11. Chi square test was applied. Results: 432 samples were obtained from 144 individuals. DNA-HPV was detected in 30 (42%) of OSCG subjects and 3 (4%) of CG. Two or more positive samples were obtained in 67% of the OSCG, 67% in saliva and 60% in biopsy; in CG 100% of the individuals were positive in the two samples. Conclusions: HPV is frequently present in oral cavity as a multifocal infection, even without the presence of clinical lesions.

show abstract

A case study in applying artificial intelligence-based named entity recognition to develop an automated ophthalmic disease registry

Macri

Teoh

Bacchi

et al. 2023

Graefes Arch Clin Exp Ophthalmol

View full text Add to dashboard Cite

Purpose Advances in artificial intelligence (AI)-based named entity extraction (NER) have improved the ability to extract diagnostic entities from unstructured, narrative, free-text data in electronic health records. However, there is a lack of ready-to-use tools and workflows to encourage the use among clinicians who often lack experience and training in AI. We sought to demonstrate a case study for developing an automated registry of ophthalmic diseases accompanied by a ready-to-use low-code tool for clinicians. Methods We extracted deidentified electronic clinical records from a single centre’s adult outpatient ophthalmology clinic from November 2019 to May 2022. We used a low-code annotation software tool (Prodigy) to annotate diagnoses and train a bespoke spaCy NER model to extract diagnoses and create an ophthalmic disease registry. Results A total of 123,194 diagnostic entities were extracted from 33,455 clinical records. After decapitalisation and removal of non-alphanumeric characters, there were 5070 distinct extracted diagnostic entities. The NER model achieved a precision of 0.8157, recall of 0.8099, and F score of 0.8128. Conclusion We presented a case study using low-code artificial intelligence-based NLP tools to produce an automated ophthalmic disease registry. The workflow created a NER model with a moderate overall ability to extract diagnoses from free-text electronic clinical records. We have produced a ready-to-use tool for clinicians to implement this low-code workflow in their institutions and encourage the uptake of artificial intelligence methods for case finding in electronic health records.

show abstract

Building a tobacco user registry by extracting multiple smoking behaviors from clinical notes

Cited by 16 publications

References 41 publications

Impact of deep learning-determined smoking status on mortality of cancer patients: never too late to quit

Impact of deep learning-determined smoking status on mortality of cancer patients: never too late to quit

Oral Human Papillomavirus: a multisite infection

A case study in applying artificial intelligence-based named entity recognition to develop an automated ophthalmic disease registry

Contact Info

Product

Resources

About