Purpose To develop a Natural Language Processing (NLP) pipeline with the ability to determine the histological subtype and site of a patient’s cancer from pathology reports. Methods A Spark NLP-based deep learning model pipeline was developed to perform named entity recognition (NER) and assertion status detection for histological subtypes before extracting key relations of interest to determine the site of a patient’s cancer from pathology reports. We assessed the ability of this NLP pipeline to extract histological subtypes and site of a patient’s cancer against manual curation of pathology reports. Results A total of 1358 reports from 474 patients seen at a single tertiary cancer centre were used in the development and validation of the pipeline. The NLP pipeline achieved a mean accuracy of 99.79% and an F1 score of 84.08% for NER of histological subtypes. The relation extraction (RE) model also achieved an average accuracy of 91.96% and an F1-score of 92.45% for key entity relations relevant to histological subtypes entities. Conclusion We developed an NLP pipeline that can extract the histological subtypes and relate them to the site of a patient’s cancer from free-text pathology reports with high accuracy. This has the potential to be deployed for both research and clinical quality processes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.