Health care professionals produce abundant textual information in their daily clinical practice and this information is stored in many diverse sources and, generally, in textual form. The extraction of insights from all the gathered information, mainly unstructured and lacking normalization, is one of the major challenges in computational medicine. In this respect, text mining (TM) assembles different techniques to derive valuable insights from unstructured textual data so it has led to be especially relevant in medicine. The aim of this paper is therefore to provide an extensive review of existing techniques and resources to perform TM tasks in medicine. In this review, more than 90 relevant research studies have been analyzed, describing the most important practical applications, terminological resources, tools, and open challenges of TM in medicine.
This article is categorized under:
Application Areas > Health Care
Algorithmic Development > Biological Data Mining
Algorithmic Development > Hierarchies and Trees
Algorithmic Development > Ensemble Methods
Existing systems to support decision‐taking process based on textual information of clinical reports are insufficient. Currently, there are few systems that unify different subtasks in a single and user‐friendly framework, easing therefore the clinical work by automating complex and arduous tasks such as the detection of clinical alerts as well as clinical information coding. To address this issue, MiNerDoc is proposed as a new text mining (TM) system whose main objective is to support clinical decision‐taking processes by analyzing textual clinical reports in a unified framework. MiNerDoc is a really alluring TM system that includes two relevant tasks in the medical field, that is, detection of risk factors according to five medical entities (disease, pharmacologic, region/part body, procedure/test, and finding/sign) and automatic prediction of standardized diagnostic codes (MeSH descriptors associated with diseases). MiNerDoc integrates a combination of techniques from the TM discipline along with the terminological and semantic enrichment provided by the MetaMap tool and UMLS metathesaurus. Some study cases as well as a wide experimental analysis on real clinical reports have been carried out to demonstrate the effectiveness and promising performance of MiNerDoc on two different tasks, that is, medical entities recognition (FMeasure 81.54%) and diagnostic classification (FMeasuremic 81.04%).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.