The unstructured nature of Real-World (RW) data from onco-hematological patients and the scarce accessibility to integrated systems restrain the use of RW information for research purposes. Natural Language Processing (NLP) might help in transposing unstructured reports into standardized electronic health records. We exploited NLP to develop an automated tool, named ARGO (Automatic Record Generator for Onco-hematology) to recognize information from pathology reports and populate electronic case report forms (eCRFs) pre-implemented by REDCap. ARGO was applied to hemo-lymphopathology reports of diffuse large B-cell, follicular, and mantle cell lymphomas, and assessed for accuracy (A), precision (P), recall (R) and F1-score (F) on internal (n = 239) and external (n = 93) report series. 326 (98.2%) reports were converted into corresponding eCRFs. Overall, ARGO showed high performance in capturing (1) identification report number (all metrics > 90%), (2) biopsy date (all metrics > 90% in both series), (3) specimen type (86.6% and 91.4% of A, 98.5% and 100.0% of P, 92.5% and 95.5% of F, and 87.2% and 91.4% of R for internal and external series, respectively), (4) diagnosis (100% of P with A, R and F of 90% in both series). We developed and validated a generalizable tool that generates structured eCRFs from real-life pathology reports.
BACKGROUND The unstructured nature of medical data from Real-World (RW) patients and the scarce accessibility for researchers to integrated systems restrain the use of RW information for clinical and translational research purposes. Natural Language Processing (NLP) might help in transposing unstructured reports in electronic health records (EHR), thus prompting their standardization and sharing. OBJECTIVE We aimed at designing a tool to capture pathological features directly from hemo-lymphopathology reports and automatically record them into electronic case report forms (eCRFs). METHODS We exploited Optical Character Recognition and NLP techniques to develop a web application, named ARGO (Automatic Record Generator for Oncology), that recognizes unstructured information from diagnostic paper-based reports of diffuse large B-cell lymphomas (DLBCL), follicular lymphomas (FL), and mantle cell lymphomas (MCL). ARGO was programmed to match data with standard diagnostic criteria of the National Institute of Health, automatically assign diagnosis and, via Application Programming Interface, populate specific eCRFs on the REDCap platform, according to the College of American Pathologists templates. A selection of 239 reports (n. 106 DLBCL, n.79 FL, and n. 54 MCL) from the Pathology Unit at the IRCCS - Istituto Tumori “Giovanni Paolo II” of Bari (Italy) was used to assess ARGO performance in terms of accuracy, precision, recall and F1-score. RESULTS By applying our workflow, we successfully converted 233 paper-based reports into corresponding eCRFs incorporating structured information about diagnosis, tissue of origin and anatomical site of the sample, major molecular markers and cell-of-origin subtype. Overall, ARGO showed high performance (nearly 90% of accuracy, precision, recall and F1-score) in capturing identification report number, biopsy date, specimen type, diagnosis, and additional molecular features. CONCLUSIONS We developed and validated an easy-to-use tool that converts RW paper-based diagnostic reports of major lymphoma subtypes into structured eCRFs. ARGO is cheap, feasible, and easily transferable into the daily practice to generate REDCap-based EHR for clinical and translational research purposes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.