We introduce an extensible and modifiable knowledge representation model to represent cancer disease characteristics in a comparable and consistent fashion. We describe a system, MedTAS/P which automatically instantiates the knowledge representation model from free-text pathology reports. MedTAS/P is based on an open-source framework and its components use natural language processing principles, machine learning and rules to discover and populate elements of the model. To validate the model and measure the accuracy of MedTAS/P, we developed a gold-standard corpus of manually annotated colon cancer pathology reports. MedTAS/P achieves F1-scores of 0.97-1.0 for instantiating classes in the knowledge representation model such as histologies or anatomical sites, and F1-scores of 0.82-0.93 for primary tumors or lymph nodes, which require the extractions of relations. An F1-score of 0.65 is reported for metastatic tumors, a lower score predominantly due to a very small number of instances in the training and test sets.
Although structured electronic health records are becoming more prevalent, much information about patient health is still recorded only in unstructured text. "Understanding" these texts has been a focus of natural language processing (NLP) research for many years, with some remarkable successes, yet there is more work to be done. Knowing the drugs patients take is not only critical for understanding patient health (e.g., for drug-drug interactions or drug-enzyme interaction), but also for secondary uses, such as research on treatment effectiveness. Several drug dictionaries have been curated, such as RxNorm, FDA's Orange Book, or NCI, with a focus on prescription drugs. Developing these dictionaries is a challenge, but even more challenging is keeping these dictionaries up-to-date in the face of a rapidly advancing field-it is critical to identify grapefruit as a "drug" for a patient who takes the prescription medicine Lipitor, due to their known adverse interaction. To discover other, new adverse drug interactions, a large number of patient histories often need to be examined, necessitating not only accurate but also fast algorithms to identify pharmacological substances. In this paper we propose a new algorithm, SPOT, which identifies drug names that can be used as new dictionary entries from a large corpus, where a "drug" is defined as a substance intended for use in the diagnosis, cure, mitigation, treatment, or prevention of disease. Measured against a manually annotated reference corpus, we present precision and recall values for SPOT. SPOT is language and syntax independent, can be run efficiently to keep dictionaries up-to-date and to also suggest words and phrases which may be misspellings or uncatalogued synonyms of a known drug. We show how SPOT's lack of reliance on NLP tools makes it robust in analyzing clinical medical text. SPOT is a generalized bootstrapping algorithm, seeded with a known dictionary and automatically extracting the context within which each drug is mentioned. We define three features of such context: support, confidence and prevalence. Finally, we present the performance tradeoffs depending on the thresholds chosen for these features.
No abstract
In this paper, we discuss the different strategies used in COMET (COordinated Multimedia Explanation Testbed) for selecting words with which the user is familiar. When pictures cannot be used to disambiguate a word or phrase, COMET has four strategies for avoiding unknown words. We give examples for each of these strategies and show how they are implemented in COMET.
BackgroundThe first step in practising Evidence Based Medicine (EBM) has been described as translating clinical uncertainty into a structured and focused clinical question that can be used to search the literature to ascertain or refute that uncertainty. In this study we focus on questions about treatments for schizophrenia posed by mental health professionals and patients to gain a deeper understanding about types of questions asked naturally, and whether they can be reformulated into structured and focused clinical questions.MethodsFrom a survey of uncertainties about the treatment of schizophrenia we describe, categorise and analyse the type of questions asked by mental health professionals and patients about treatment uncertainties for schizophrenia. We explore the value of mapping from an unstructured to a structured framework, test inter-rater reliability for this task, develop a linguistic taxonomy, and cross tabulate that taxonomy with elements of a well structured clinical question.ResultsFew of the 78 Patients and 161 clinicians spontaneously asked well structured queries about treatment uncertainties for schizophrenia. Uncertainties were most commonly about drug treatments (45.3% of clinicians and 41% of patients), psychological therapies (19.9% of clinicians and 9% of patients) or were unclassifiable.(11.8% of clinicians and 16.7% of patients). Few naturally asked questions could be classified using the well structured and focused clinical question format (i.e. PICO format). A simple linguistic taxonomy better described the types of questions people naturally ask.ConclusionPeople do not spontaneously ask well structured clinical questions. Other taxonomies may better capture the nature of questions. However, access to EBM resources is greatly facilitated by framing enquiries in the language of EBM, such as posing queries in PICO format. People do not naturally do this. It may be preferable to identify a way of searching the literature that more closely matches the way people naturally ask questions if access to information about treatments are to be made more broadly available.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.