Proceedings of the Third Workshop on Representation Learning for NLP 2018
DOI: 10.18653/v1/w18-3011
|View full text |Cite
|
Sign up to set email alerts
|

A Hybrid Learning Scheme for Chinese Word Embedding

Abstract: To improve word embedding, subword information has been widely employed in state-of-the-art methods. These methods can be classified to either compositional or predictive models. In this paper, we propose a hybrid learning scheme, which integrates compositional and predictive model for word embedding. Such a scheme can take advantage of both models, thus effectively learning word embedding. The proposed scheme has been applied to learn word representation on Chinese. Our results show that the proposed scheme c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 13 publications
0
3
0
Order By: Relevance
“…For ideographic languages like ZH, word embeddings trained on stroke signals (which is analogous to subword information of alphabetic languages) achieve state-of-the-art performance (Cao et al, 2018), so we utilise them to obtain monolingual vectors. Compared with simplified characters (which dominate our training resources), traditional ones typically provide much richer stroke signals and thus benefit stroke-based embeddings (Chen and Sheng, 2018), e.g., traditional '葉' (leaf ) contains semantically related components of '艹' (plant) and '木' (wood), while its simplified version ('叶') does not.…”
Section: Methodsmentioning
confidence: 99%
“…For ideographic languages like ZH, word embeddings trained on stroke signals (which is analogous to subword information of alphabetic languages) achieve state-of-the-art performance (Cao et al, 2018), so we utilise them to obtain monolingual vectors. Compared with simplified characters (which dominate our training resources), traditional ones typically provide much richer stroke signals and thus benefit stroke-based embeddings (Chen and Sheng, 2018), e.g., traditional '葉' (leaf ) contains semantically related components of '艹' (plant) and '木' (wood), while its simplified version ('叶') does not.…”
Section: Methodsmentioning
confidence: 99%
“…The CBoW and SG models consider words as basic units ignoring rich subword information, thereby, significantly limiting the performance of the models [11]. This shortcoming was addressed in [1] by extending the SG model using subword information.…”
Section: Words Embeddingmentioning
confidence: 99%
“…Word representation is one of fundamental topics in Natural Language Processing (NLP). Many methods have been proposed to learn dense vector representations of words [1]- [4] and successfully applied in various tasks such as language modeling [5] and text classification [6].…”
Section: Introductionmentioning
confidence: 99%