2021
DOI: 10.1029/2021ea001673
|View full text |Cite
|
Sign up to set email alerts
|

Chinese Word Segmentation Based on Self‐Learning Model and Geological Knowledge for the Geoscience Domain

Abstract: is used to capture the abundant word level features, grammatical structure features and semantic features in sentences. The self-learning strategy assisted by domain knowledge can automatically construct the domain training corpus without manual intervention. A set of experiments to verify the effectiveness of the proposed method on an available manually constructed hybrid dataset.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 36 publications
0
11
0
Order By: Relevance
“…Their algorithm was able to segment both generic domain words and geological domain words. Li et al (2021) constructed a Chinese word segmentation algorithm based on a geological domain ontology assisted by a self-loop approach to better segment geological domain texts. Ma et al (2021) employed a deep learning model to train journal abstracts and titles in the field of Chinese geology.…”
Section: Text Mining In Geosciencementioning
confidence: 99%
See 1 more Smart Citation
“…Their algorithm was able to segment both generic domain words and geological domain words. Li et al (2021) constructed a Chinese word segmentation algorithm based on a geological domain ontology assisted by a self-loop approach to better segment geological domain texts. Ma et al (2021) employed a deep learning model to train journal abstracts and titles in the field of Chinese geology.…”
Section: Text Mining In Geosciencementioning
confidence: 99%
“…Li et al. (2021) constructed a Chinese word segmentation algorithm based on a geological domain ontology assisted by a self‐loop approach to better segment geological domain texts. Ma et al.…”
Section: Related Workmentioning
confidence: 99%
“…The results show that the F1 value (the summed average of precision and recall) of the model is 94%, and the performance of the geological domain subword recognition is better than that of other traditional models; the BiLSTM is projected to be an effective method for geological big data mining. The proposed approach in this research is based on our previous work (Li et al., 2021). Compared with this work, the following improvements are noted in the current research: (a) In this study, an approach to geological text lemmatization using a Chinese geological domain data set after training is presented.…”
Section: Introductionmentioning
confidence: 99%
“…Compared with this work, the following improvements are noted in the current research: (a) In this study, an approach to geological text lemmatization using a Chinese geological domain data set after training is presented. The model is more efficient and practical compared to the model proposed by Li et al (2021), in which a generic domain data set was utilized for training, and then, a round-robin strategy for multiple tumbling was used to construct the word segmentation model. (b) In contrast to the research by Li et al (2021), the existing state-of-the-art pretraining model, that is, GeoBERT, is applied in this research, resulting in better recognition performance.…”
mentioning
confidence: 99%
See 1 more Smart Citation