2019
DOI: 10.12783/dtcse/ammms2018/27260
|View full text |Cite
|
Sign up to set email alerts
|

Research on Domain Term Dictionary Construction Based on Chinese Wikipedia

Abstract: Domain terms are words or phrases that represent concepts or relationships in a specific domain. It can represent the characteristics of corresponding domains. The automatic construction of domain-specific dictionary is an important task in natural language processing, which can be adopted in domain-specific ontology construction, vertical search, text classification, information retrieval, question answering system etc. In this paper, we propose a novel method for constructing domain term dictionary based on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 3 publications
0
3
0
Order By: Relevance
“…log(𝑁) (5) Where π‘π‘Žπ‘‘π·π‘’π‘ π‘π‘’π‘›π‘‘π‘Žπ‘›π‘‘π‘ πΌπ‘‰(𝑐) is the information value of category 𝑐, des(c) is the number of descendants of category 𝑐 and 𝑁 is the set of all categories in Wikipedia. It is assumed that the category with a large number of descendants is more general and thus has less information value.…”
Section: π‘π‘Žπ‘‘π·π‘’π‘ π‘π‘’π‘›π‘‘π‘Žπ‘›π‘‘π‘ πΌπ‘‰(𝑐) = 1 βˆ’ Log(des(c))mentioning
confidence: 99%
See 1 more Smart Citation
“…log(𝑁) (5) Where π‘π‘Žπ‘‘π·π‘’π‘ π‘π‘’π‘›π‘‘π‘Žπ‘›π‘‘π‘ πΌπ‘‰(𝑐) is the information value of category 𝑐, des(c) is the number of descendants of category 𝑐 and 𝑁 is the set of all categories in Wikipedia. It is assumed that the category with a large number of descendants is more general and thus has less information value.…”
Section: π‘π‘Žπ‘‘π·π‘’π‘ π‘π‘’π‘›π‘‘π‘Žπ‘›π‘‘π‘ πΌπ‘‰(𝑐) = 1 βˆ’ Log(des(c))mentioning
confidence: 99%
“…Due to this importance and popularity, many works have exploited Wikipedia as a knowledge source to measure the semantic relatedness between terms. Some of these works have exploited the structure of Wikipedia articles such as categories, hyperlinks, and templates [4,5]. Another line of works has attempted to measure similarity based on the natural language processing of the textual content of Wikipedia articles [6][7][8][9].…”
Section: Introductionmentioning
confidence: 99%
“…Embedding. Firstly,Word segmentation tool jieba [23] with the vehicle domain dictionary constructed in our previous published paper [24] is employed on the complaint text S. Then, the segmented complaint text S is vectorized in the embedding layer, which can reduce the input dimension and reduce the number of parameters of the neural network. Furthermore, the dense vector representation of the word vector layer can contain more semantic information [25].…”
Section: Multi-label Classification Of Vehicle Defect Information Colmentioning
confidence: 99%