The introduction of pre-trained language models in natural language processing (NLP) based on deep learning and the availability of electronic health records (EHRs) presents a great opportunity to transfer the “knowledge” learned from data in the general domain to enable the analysis of unstructured textual data in clinical domains. This study explored the feasibility of applying NLP to a small EHR dataset to investigate the power of transfer learning to facilitate the process of patient screening in psychiatry. A total of 500 patients were randomly selected from a medical center database. Three annotators with clinical experience reviewed the notes to make diagnoses for major/minor depression, bipolar disorder, schizophrenia, and dementia to form a small and highly imbalanced corpus. Several state-of-the-art NLP methods based on deep learning along with pre-trained models based on shallow or deep transfer learning were adapted to develop models to classify the aforementioned diseases. We hypothesized that the models that rely on transferred knowledge would be expected to outperform the models learned from scratch. The experimental results demonstrated that the models with the pre-trained techniques outperformed the models without transferred knowledge by micro-avg. and macro-avg. F-scores of 0.11 and 0.28, respectively. Our results also suggested that the use of the feature dependency strategy to build multi-labeling models instead of problem transformation is superior considering its higher performance and simplicity in the training process.
Cancer registries are critical databases for cancer research whose maintenance requires various types of domain knowledge with labor-intensive data curation. In order to facilitate the curation process with high quality in a timely manner, we developed a hybrid neural symbolic system for cancer registry coding. Unlike previous works which mainly worked on the dataset collected from one hospital or formulated the task as text classification problems, we collaborated with two medical centers in Taiwan to compile a crosshospital corpus and applied neural networks to extract cancer registry variables described in unstructured pathology reports along with expert systems for generating registry coding. We conducted experiments to study the feasibility of the proposed hybrid for the task of cancer registry coding and compare its performance with state-of-the-art non-hybrid approaches. Furthermore, cross-hospital experiments were performed to study the advantages and limitations of transfer learning for processing reports from different sources. The experiment results demonstrated that the proposed hybrid neural symbolic system is a robust approach which works well across hospitals and outperformed classification-based baselines by F-scores of 0.13~0.27. Compared to the baseline models, the F-scores of the proposed approaches are apparently higher when fewer training instances were used. All methods benefited from the transferred parameters learned from the source dataset, but the results suggest that it is a better strategy to transfer the learned knowledge through the concept recognition task followed by the symbolic expert system to address the task of cancer registry coding. INDEX TERMS Electronic medical records, medical expert systems, medical information systems, natural language processingThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.
A cancer registry is a critical and massive database for which various types of domain knowledge are needed and whose maintenance requires labor-intensive data curation. In order to facilitate the curation process for building a high-quality and integrated cancer registry database, we compiled a cross-hospital corpus and applied neural network methods to develop a natural language processing system for extracting cancer registry variables buried in unstructured pathology reports. The performance of the developed networks was compared with various baselines using standard micro-precision, recall and Fmeasure. Furthermore, we conducted experiments to study the feasibility of applying transfer learning to rapidly * Corresponding authors develop a well-performing system for processing reports from different sources that might be presented in different writing styles and formats. The results demonstrate that the transfer learning method enables us to develop a satisfactory system for a new hospital with only a few annotations and suggest more opportunities to reduce the burden of cancer registry curation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.