The provision of personalized patient information has been encouraged as a means of complementing information provided during patient-doctor consultations, and linked to better health outcomes through patient compliance with prescribed treatments. The generation of such texts as a controlled fragment of Runyankore, a Bantu language indigenous to Uganda, requires the appropriate tense and aspect, as well as a method for verb conjugation. We present how an analysis of corpora of explanations of prescribed medications was used to identify the simple present tense and progressive aspect as appropriate for our selected domain. A CFG is defined to conjugate and generate the correct form of the verb.
There are many domain-specific and language-specific NLG systems, which are possibly adaptable across related domains and languages. The languages in the Bantu language family have their own set of features distinct from other major groups, which therefore severely limits the options to bootstrap an NLG system from existing ones. We present here our first proof-of-concept application for knowledge-to-text NLG as a plugin to the Protégé 5.x ontology development system, tailored to Runyankore, a Bantu language indigenous to Uganda. It comprises a basic annotation model for linguistic information such as noun class, an implementation of existing verbalisation rules and a CFG for verbs, and a basic interface for data entry.
Background Advances in Machine learning (ML) for biomedical research have led to ground-breaking results that are evident in several healthcare settings. In the field of NLP, there are applications for text mining, named entity recognition, classification of pathology and radiology reports, and text generation. Despite these advancements, the main barrier towards the use of AI systems in a clinical setting is their lack of explainability and interpretability. Objective From the various avenues available through which to create transparent and explainable ML models, we investigated how a stable, accurate, and trusted biomedical standard, the Unified Medical Language System (UMLS) can be applied to retrospectively justify and explain the results of ML models. Methods We developed a novel architecture that places a UMLS-based system after the ML model, and this then acts as a verifier to confirm the accuracy, or lack thereof, of the results from an ML model, and goes on further to explain the results from the model. This architecture is intended to be model-agnostic, so we evaluated its effectiveness using two NLP tasks: classification and Named Entity Recognition (NER). For classification, the UMLS-based verifier was applied to the results from classifying the topographies in 1964 unstructured and anonymized breast cancer pathology reports by a Multi-Task Convolutional Neural Network (MT-CNN). For NER, the UMLS-based verifier was applied to the results of the HunFlair model on unstructured and anonymized breast, colon, and small intestine cancer pathology reports. Results For the classification evaluation, we found that an entity's National Cancer Institute term (NCIt) code can be used to obtain a topographical range for individual entities in a pathology report. We found, further, that, whilst there are entities whose topographical range contributes positively towards a report's overall topography classification, there are also entities that contribute negatively, and that the number of these negative-contributing entities is inversely proportional to the confidence value from the ML model. For the NER evaluation, we found that the UMLS-based verifier is able to both confirm accurate model annotations, and group together the different kinds of inaccuracies found. Additionally, the grouping of an incorrectly tagged entity was found to be correlated with lower confidence values from the model. Conclusion The architecture we propose retrospectively verifies and explains the results of ML models, thus providing a level of interpretability to a model’s outputs. Our use of an industry standard healthcare knowledge repository, the UMLS, is an important contribution towards trusting the results of AI systems in healthcare. Citation Format: Joan Byamugisha, Waheeda Saib, Theodore Gaelejwe, Asad Jeewa, Maletsabisa Molapo. Towards verifying results from biomedical deep learning models using the UMLS: Cases of primary tumor site classification and cancer Named Entity Recognition [abstract]. In: Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; 2021 Jan 13-14. Philadelphia (PA): AACR; Clin Cancer Res 2021;27(5_Suppl):Abstract nr PR-12.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.