2019
DOI: 10.1200/cci.19.00006
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing Case Capture, Quality, and Completeness of Primary Melanoma Pathology Records via Natural Language Processing

Abstract: PURPOSE Medical records contain a wealth of useful, informative data points valuable for clinical research. Most data points are stored in semistructured or unstructured legacy documents and require manual data abstraction into a structured format to render the information more readily accessible, searchable, and generally analysis ready. The substantial labor needed for this can be cost prohibitive, particularly when dealing with large patient cohorts. METHODS To establish a high-throughput approach to data a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(11 citation statements)
references
References 10 publications
0
11
0
Order By: Relevance
“…However, with the improvement of the digital hospital and laboratory information system and the advent of machine learning based on deep neural networks, an approach may be a viable solution to generate detailed data in high-volume capacity ( 35 , 36 ). It is convenient to capture all factors relating to the scores in the electronic medical records (EMR), calculate them automatically, and then quickly preset the predicting risk of patients in the system in auto ( 37 , 38 ). It should be noted that dichotomizing continuous risk scores into a regression model might not be the optimal choice, which could induce a potential risk of inaccuracy.…”
Section: Discussionmentioning
confidence: 99%
“…However, with the improvement of the digital hospital and laboratory information system and the advent of machine learning based on deep neural networks, an approach may be a viable solution to generate detailed data in high-volume capacity ( 35 , 36 ). It is convenient to capture all factors relating to the scores in the electronic medical records (EMR), calculate them automatically, and then quickly preset the predicting risk of patients in the system in auto ( 37 , 38 ). It should be noted that dichotomizing continuous risk scores into a regression model might not be the optimal choice, which could induce a potential risk of inaccuracy.…”
Section: Discussionmentioning
confidence: 99%
“…An appealing application of ML, specifically NLP, to study data management is to automate data collection into case report forms, decreasing the time, expense, and potential for error associated with human data extraction, whether in prospective trials or retrospective reviews. Though this use requires overcoming variable data structures and provenances, it has shown early promise in cancer [ 43 , 44 ], epilepsy [ 30 ], and depression [ 45 ], among other areas [ 29 ]. Regardless of how data have been collected, ML can power risk-based monitoring approaches to clinical trial surveillance, enabling the prevention and/or early detection of site failure, fraud, and data inconsistencies or incompleteness that may delay database lock and subsequent analysis.…”
Section: Data Collection and Managementmentioning
confidence: 99%
“…Because reviewing more than 20 000 pathology reports to generate our present study cohort would require immense amounts of time and human labor, we applied the NLP algorithm during the data set construction phase. Previously, Malke et al 22 developed an NLP platform that can identify and abstract melanoma primary prognostic factors with a less than 5% error rate compared with manual extraction, resulting in enormous improvement in efficiency. Our NLP algorithm is capable of automatically parsing pathology reports and can extract more than 97.7% of results with 100% precision, thus significantly reducing workload and allowing for large-scale patient analysis to be performed.…”
Section: Discussionmentioning
confidence: 99%
“…Natural language processing (NLP), which enables computers to process the free-text sections automatically using knowledge-based approaches and/or machine learning algorithms, has shown potential in identifying primary melanoma cases from electronic health records and extracting histopathologic characteristics from pathology reports. 22 Previous studies have demonstrated that NLP is an effective approach for case identification and may reduce the number of manual reviews in oncology 23 , 24 and other medical domains. 25 , 26 In this study, we aimed to assess the prognostic significance of TILs in patients with cutaneous melanoma using a large patient cohort with clinical and histopathologic characteristics identified using NLP techniques.…”
Section: Introductionmentioning
confidence: 99%