Machine learning classification of surgical pathology reports and chunk recognition for information extraction noise reduction

Napolitano, Giulio; Marshall, A.H.; Hamilton, Peter; Gavin, Anna

doi:10.1016/j.artmed.2016.06.001

Cited by 40 publications

(25 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With ML models, it can also be possible to improve quality of medical data, reduce fluctuations in patient rates, and save in medical costs. Therefore, these models are frequently used to investigate diagnostic analysis when compared with other conventional methods [10]. To reduce the death rates caused by chronic diseases (CDs), early detection and effective treatments are the only solutions [11].…”

Section: Introductionmentioning

confidence: 99%

Applications of Machine Learning Predictive Models in the Chronic Disease Diagnosis

Battineni

Sagaro

Chinatalapudi

et al. 2020

JPM

239

View full text Add to dashboard Cite

This paper reviews applications of machine learning (ML) predictive models in the diagnosis of chronic diseases. Chronic diseases (CDs) are responsible for a major portion of global health costs. Patients who suffer from these diseases need lifelong treatment. Nowadays, predictive models are frequently applied in the diagnosis and forecasting of these diseases. In this study, we reviewed the state-of-the-art approaches that encompass ML models in the primary diagnosis of CD. This analysis covers 453 papers published between 2015 and 2019, and our document search was conducted from PubMed (Medline), and Cumulative Index to Nursing and Allied Health Literature (CINAHL) libraries. Ultimately, 22 studies were selected to present all modeling methods in a precise way that explains CD diagnosis and usage models of individual pathologies with associated strengths and limitations. Our outcomes suggest that there are no standard methods to determine the best approach in real-time clinical practice since each method has its advantages and disadvantages. Among the methods considered, support vector machines (SVM), logistic regression (LR), clustering were the most commonly used. These models are highly applicable in classification, and diagnosis of CD and are expected to become more important in medical practice in the near future.

show abstract

Section: Introductionmentioning

confidence: 99%

Applications of Machine Learning Predictive Models in the Chronic Disease Diagnosis

Battineni

Sagaro

Chinatalapudi

et al. 2020

JPM

239

View full text Add to dashboard Cite

show abstract

“…Despite near-universal electronic medical record use, pathology reports remain as free text containing semistructured elements detailing a specimen's source and gross and microscopic characteristics. Although other groups have developed tools to aid in parsing the cancer type or tumor characteristics from these reports, such as to identify relevant patients for registry inclusion [1][2][3][4] or to determine TNM staging, [5][6][7][8][9] none, to our knowledge, have attempted to extract and group semistructured specimen identifiers themselves.…”

Section: Introductionmentioning

confidence: 99%

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics

Oliwa

Maron

Chase

et al. 2019

JCO Clinical Cancer Informatics

View full text Add to dashboard Cite

PURPOSE Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions. PATIENTS AND METHODS Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step. RESULTS We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance. CONCLUSION Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.

show abstract

“…The accuracy of free text interpretations in pathology varies substantially; it can be nearly perfect (99% accurate) or be quite poor (65%). [ 2 ] Seen practically, in the context of analyzing free-text pathology reports, this may limit analysis work on conditions that have a prevalence lower than the error rate.…”

Section: Introductionmentioning

confidence: 99%

Next Generation Quality: Assessing the Physician in Clinical History Completeness and Diagnostic Interpretations Using Funnel Plots and Normalized Deviations Plots in 3,854 Prostate Biopsies

Bonert

El-Shinnawy

Carvalho

et al. 2017

Journal of Pathology Informatics

View full text Add to dashboard Cite

Background:Observational data and funnel plots are routinely used outside of pathology to understand trends and improve performance.Objective:Extract diagnostic rate (DR) information from free text surgical pathology reports with synoptic elements and assess whether inter-rater variation and clinical history completeness information useful for continuous quality improvement (CQI) can be obtained.Methods:All in-house prostate biopsies in a 6-year period at two large teaching hospitals were extracted and then diagnostically categorized using string matching, fuzzy string matching, and hierarchical pruning. DRs were then stratified by the submitting physicians and pathologists. Funnel plots were created to assess for diagnostic bias.Results:3,854 prostate biopsies were found and all could be diagnostically classified. Two audits involving the review of 700 reports and a comparison of the synoptic elements with the free text interpretations suggest a categorization error rate of <1%. Twenty-seven pathologists each read >40 cases and together assessed 3,690 biopsies. There was considerable inter-rater variability and a trend toward more World Health Organization/International Society of Urologic Pathology Grade 1 cancers in older pathologists. Normalized deviations plots, constructed using the median DR, and standard error can elucidate associated over- and under-calls for an individual pathologist in relation to their practice group. Clinical history completeness by submitting medical doctor varied significantly (100% to 22%).Conclusion:Free text data analyses have some limitations; however, they could be used for data-driven CQI in anatomical pathology, and could lead to the next generation in quality of care.

show abstract

Machine learning classification of surgical pathology reports and chunk recognition for information extraction noise reduction

Cited by 40 publications

References 16 publications

Applications of Machine Learning Predictive Models in the Chronic Disease Diagnosis

Applications of Machine Learning Predictive Models in the Chronic Disease Diagnosis

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics

Next Generation Quality: Assessing the Physician in Clinical History Completeness and Diagnostic Interpretations Using Funnel Plots and Normalized Deviations Plots in 3,854 Prostate Biopsies

Contact Info

Product

Resources

About