Massive unstructured geoscience data are buried in geological reports. Geological text classification provides opportunities to leverage this wealth of data for geology and mineralization research. Existing studies of massive geoscience documents/reports have not provided effective classification results for further knowledge discovery and data mining and often lack adequate domain‐specific knowledge. In this paper, we present a novel and unified framework (namely, Dic‐Att‐BiLSTM) that combines domain‐specific knowledge and bidirectional long short‐term memory (BiLSTM) for effective geological text classification. Dic‐Att‐BiLSTM benefits from a matching strategy by incorporating domain‐specific knowledge developed based on geoscience ontology to grasp the linguistic geoscience clues. Furthermore, Dic‐Att‐BiLSTM brings together the capacity of a geoscience dictionary matching approach and an attention mechanism to construct a dictionary attention layer. Finally, the network framework of Dic‐Att‐BiLSTM can utilize domain‐specific knowledge and classify geological text automatically. Experimental verifications are conducted on two constructed data sets, and the results clearly indicate that Dic‐Att‐BiLSTM outperforms other state‐of‐the‐art text classification models.
Toponym recognition is used to extract toponyms from natural language texts, which is a fundamental task of ubiquitous geographic information applications. Existing toponym recognition methods with state‐of‐the‐art performance mainly leverage supervised learning (i.e., deep‐learning‐based approaches) with parameters learned from massive, labeled datasets that must be annotated manually. This is a great inconvenience when model training needs to fit different domain texts, especially those of social media messaging. To address this issue, this article proposes a weakly supervised Chinese toponym recognition (ChineseTR) architecture that leverages a training dataset creator that generates training datasets automatically based on word collections and associated word frequencies from various texts and an extension recognizer that employs a basic bidirectional recurrent neural network based on particular features designed for toponym recognition. The results show that the proposed ChineseTR achieves a 0.76 F1 score in a corpus with a 0.718 out‐of‐vocabulary rate and a 0.903 in‐vocabulary rate. All comparative experiments demonstrate that ChineseTR is an effective and scalable architecture that recognizes toponyms.
Spatial relations are frequently described and used in natural language texts, and relations play a core role in a range of applications—from supporting geographic information retrieval in natural language texts to locating people and objects in natural disaster response situations. In this article, we present a neuro‐net spatial extraction model (NeuroSPE) designed to address various language irregularities (i.e., a variety of sentence structures) that occur in natural language texts. We also propose a two‐stage workflow to generate a training dataset based on a collection of words and their associated frequencies. The first stage of the proposed workflow focuses on processing the words in the input data and their associated frequencies; then, the words are segmented into a set of groups and used to accelerate model training. The second stage automatically generates a variety of sentences that include two geographic entities and related spatial relation terms through deep learning iteration based on a unigram language model. We evaluate our method both qualitatively and quantitatively using a real dataset. The experimental results demonstrate that the proposed two‐stage workflow effectively extracts spatial relations from natural language texts and outperforms other current state‐of‐the‐art approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.