Quantitative information derived from scientific documents provides an important source of data for studies in almost all domains, however, manual extraction of this information is very time consuming. In this paper we will introduce a system Geo-Quantities that supports the automatic extraction of quantitative, spatial and temporal information of a given measurement entity from scientific literature using text mining techniques. The difficulty of automatic measurement recognition is mainly caused by the diverse expressions in the papers. Geo-Quantities offers an interactive interface for the visualization of extracted user-defined information, in particular spatial and temporal context. In our demonstration, we will showcase the capabilities of our system by retrieving measurements such as "mass accumulation rates" and "sedimentation rates" from scientific publications in the field of marine geology, which could have high impact in studies for building global mass accumulation rate maps. For training and evaluation of Geo-Quantities we use a corpus of domain-relevant papers. CCS CONCEPTS• Applied computing → Document management and text processing; • Computing methodologies → Information extraction.
<p>In Marine Geology, scientists persistently perform extensive experiments to measure diverse features across the globe, hence to estimate environmental changes. For example, Mass Accumulation Rate (MAR) and Sedimentation Rate (SR) are measured by marine geologists at various oceanographic locations and are largely reported in research publications but have not been compiled in any central database. Furthermore, every MAR and SR observation normally carries <em>i)</em> exact locational information (Longitude and Latitude), <em>ii)</em> the method of measurement (stratigraphy, 210Pb), <em>iii)</em> a numerical value and units (2.4 g/m<sup>2</sup>/yr), <em>iv)</em> temporal feature (e.g. hundred years ago). The contextual information attached to MAR and SR observations is heterogeneous and manual approaches for information extraction from text are infeasible. It is also worth mentioning that MAR and SR are not denoted in standard international (SI) units.</p> <p>We propose the comprehensive end-to-end framework GEOTEK (Geological Text to Knowledge) to extract targeted information from marine geology publications. The proposed framework comprises three modules. The first module carries a document relevance model alongside a PDF extractor, capable of filtering relevant sources using metadata, and the extraction module extracts text, tables, and metadata respectively. The second module mainly comprises of two information extractors, namely Geo-Quantities and Geo-Spacy, particularly trained on text from the Marine Geology domain. Geo-Quantities is capable of extracting relevant numerical information from the text and covers more than 100 unit variants for MAR and SR, while Geo-Spacy extracts a set of relevant named entities as well as locational entities, which are further processed to obtain respective geocode boundaries. The third module, the Heterogeneous Information Linking module (HIL), processes exact spatial information from tables and captions and forms links to the previously extracted measurements. Finally, the all-linked information is populated in an interactive map view.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.