Abstract. Within the scientific literature, tables are commonly used to present factual and statistical information in a compact way, which is easy to digest by readers. The ability to "understand" the structure of tables is key for information extraction in many domains. However, the complexity and variety of presentation layouts and value formats makes it difficult to automatically extract roles and relationships of table cells. In this paper, we present a model that structures tables in a machine readable way and a methodology to automatically disentangle and transform tables into the modelled data structure. The method was tested in the domain of clinical trials: it achieved an F-score of 94.26% for cell function identification and 94.84% for identification of inter-cell relationships.
The scientific literature is growing exponentially, and professionals are no more able to cope with the current amount of publications. Text mining provided in the past methods to retrieve and extract information from text; however, most of these approaches ignored tables and figures. The research done in mining table data still does not have an integrated approach for mining that would consider all complexities and challenges of a table. Our research is examining the methods for extracting numerical (number of patients, age, gender distribution) and textual (adverse reactions) information from tables in the clinical literature. We present a requirement analysis template and an integral methodology for information extraction from tables in clinical domain that contains 7 steps: (1) table detection, (2) functional processing, (3) structural processing, (4) semantic tagging, (5) pragmatic processing, (6) cell selection and (7) syntactic processing and extraction. Our approach performed with the F-measure ranged between 82 and 92%, depending on the variable, task and its complexity.
Current biomedical text mining efforts are mostly focused on extracting information from the body of research articles. However, tables contain important information such as key characteristics of clinical trials. Here, we examine the feasibility of information extraction from tables. We focus on extracting data about clinical trial participants. We propose a rule-based method that decomposes tables into cell level structures and then extracts information from these structures. Our method performed with a F-measure of 83.3% for extraction of number of patients, 83.7% for extraction of patient's body mass index and 57.75% for patient's weight. These results are promising and show that information extraction from tables in biomedical literature is feasible.
Data collected throughout the duration of a clinical trial can amount to tens of thousands, or even hundreds of thousands of data points; which require expert interpretation and analysis to determine the efficacy, tolerability and safety profile of an investigational drug. Continuous monitoring and interpretation of these raw data are critical in maintaining patients' safety. Realising this, however, has proved a significant challenge, due to the requirement to manually aggregate patient and population data to compile differing clinical data types (e.g. Adverse Events (AEs) and laboratory measurements) over multiple time-points. Additional data challenges are identified in the data formatting and presentation, which is required for successful and accurate interpretation. Furthermore, once a clinical trial has finished, analysis and interpretation of the validated data is mandatory.In order to address the key data challenges, we have developed automated data integration and visualisation tools; REACT (REal-time Analytics for Clinical Trials) for on-going trials, and DETECT (Data Evaluation Tool for the End of Clinical Trial data) for finished trials. In this talk, REACT and DETECT will be presented to show how they provide an intuitive visual platform for data interpretation; enabling physicians to interact with the data, quickly view relationships between different clinical data and assess these data over time; both at a patient and population level. The result of which help us to keep our trial participants safe, improve our ability to efficiently make data-driven, scientific decisions and ultimately contribute to the development of medical treatments to improve the lives of patients.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.