2014
DOI: 10.1007/978-3-319-12640-1_34
|View full text |Cite
|
Sign up to set email alerts
|

Radical-Enhanced Chinese Character Embedding

Abstract: We present a method to leverage radical for learning Chinese character embedding. Radical is a semantic and phonetic component of Chinese character. It plays an important role as characters with the same radical usually have similar semantic meaning and grammatical usage. However, existing Chinese processing algorithms typically regard word or character as the basic unit but ignore the crucial radical information. In this paper, we fill this gap by leveraging radical for learning continuous representation of C… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
68
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 103 publications
(68 citation statements)
references
References 13 publications
0
68
0
Order By: Relevance
“…We learn an embedding for each character in the training corpus (Sun et al, 2014;.This removes the dependency on pre-processing the text, and better fits our intended use case: NER tagging over characters. Since there are many fewer characters than words, we learn many fewer embeddings.…”
Section: Character Embeddingsmentioning
confidence: 99%
“…We learn an embedding for each character in the training corpus (Sun et al, 2014;.This removes the dependency on pre-processing the text, and better fits our intended use case: NER tagging over characters. Since there are many fewer characters than words, we learn many fewer embeddings.…”
Section: Character Embeddingsmentioning
confidence: 99%
“…Incorporating subword information for word embeddings (Bojanowski et al, 2017;Cotterell et al, 2016b;Wieting et al, 2016;Yin et al, 2016) facilitates modeling rare words and can improve the performance of several NLP tasks to which the embeddings are applied. Besides, people also consider character embeddings which have been utilized in Chinese word segmentation (Sun et al, 2014).…”
Section: Related Workmentioning
confidence: 99%
“…With regards to ideographic languages, there is work in information retrieval that has considered the appropriate representation for indexing; the focus has typically been word versus character (Kwok, 1997;Baldwin, 2009), but Fujii and Croft (1993) considered (though ultimately rejected) subcharacter based indexing. In terms of investigations of the usefulness of sub-character representations for neural network models in ideographic languages, relevant work includes recent papers that use sub-character information to assist in the training of character embeddings for Chinese (Sun et al, 2014;Li et al, 2015;Yin et al, 2016) or build sub-character embeddings directly (Shi et al, 2015), demonstrating that sub-character information is useful for representing semantics in Chinese. However, our work differs not only in language and task, but also in our use of decomposition, since the work done in Chinese has primarily focused on a single semantically relevant sub-character (known as the radical), despite the fact that other sub-characters do provide additional semantic information in some characters.…”
Section: Related Workmentioning
confidence: 99%