2021
DOI: 10.1155/2021/3790176
|View full text |Cite
|
Sign up to set email alerts
|

Chinese Personal Name Disambiguation Based on Clustering

Abstract: Personal name disambiguation is a significant issue in natural language processing, which is the basis for many tasks in automatic information processing. This research explores the Chinese personal name disambiguation based on clustering technique. Preprocessing is applied to transform raw corpus into standardized format at the beginning. And then, Chinese word segmentation, part-of-speech tagging, and named entity recognition are accomplished by lexical analysis. Furthermore, we make an effort to extract fea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 12 publications
0
1
0
Order By: Relevance
“…For example, the combination Gaofeng (高峰) could be the surname Gao followed by the given name Feng but could also be the word for 'peak'. 8 Han, Zu and Zhao (2011) and Fan and Li (2021) describe approaches based on clustering in which the same names appearing in different documents are disambiguated by reference to other words appearing with them in the text. The problems these papers address is different to the one we face in our own linkage of names in tabular datasets where the surname and given name are clearly specified in fields of their own, but relevant for efforts by others to extract and disambiguate names in unstructured historical texts like newspaper articles, books, and essays.…”
Section: Introductionmentioning
confidence: 99%
“…For example, the combination Gaofeng (高峰) could be the surname Gao followed by the given name Feng but could also be the word for 'peak'. 8 Han, Zu and Zhao (2011) and Fan and Li (2021) describe approaches based on clustering in which the same names appearing in different documents are disambiguated by reference to other words appearing with them in the text. The problems these papers address is different to the one we face in our own linkage of names in tabular datasets where the surname and given name are clearly specified in fields of their own, but relevant for efforts by others to extract and disambiguate names in unstructured historical texts like newspaper articles, books, and essays.…”
Section: Introductionmentioning
confidence: 99%