Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2016
DOI: 10.18653/v1/p16-1039
|View full text |Cite
|
Sign up to set email alerts
|

Neural Word Segmentation Learning for Chinese

Abstract: Most previous approaches to Chinese word segmentation formalize this problem as a character-based sequence labeling task so that only contextual information within fixed sized local windows and simple interactions between adjacent tags can be captured. In this paper, we propose a novel neural framework which thoroughly eliminates context windows and can utilize complete segmentation history. Our model employs a gated combination neural network over characters to produce distributed representations of word cand… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
130
1

Year Published

2016
2016
2019
2019

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 137 publications
(133 citation statements)
references
References 29 publications
2
130
1
Order By: Relevance
“…In contrast, we leverage both character embeddings and word embeddings for better accuracies. (Morita et al, 2015;Liu et al, 2016;Cai and Zhao, 2016), which are different from our work in the basic framework. For instance, Liu et al (2016) follow Andrew (2006) using a semi-CRF for structured inference.…”
Section: Error Analysiscontrasting
confidence: 69%
“…In contrast, we leverage both character embeddings and word embeddings for better accuracies. (Morita et al, 2015;Liu et al, 2016;Cai and Zhao, 2016), which are different from our work in the basic framework. For instance, Liu et al (2016) follow Andrew (2006) using a semi-CRF for structured inference.…”
Section: Error Analysiscontrasting
confidence: 69%
“…As shown in Table 4, pre-training with conventional skip-gram embeddings gives only small improvements, which is consistent as findings of previous work (Chen et al, 2015a;Ma and 2015; Cai and Zhao, 2016). Segmentation with self-training even shows accuracy drops on PKU and MSR.…”
Section: In-domain Resultssupporting
confidence: 90%
“…The method of Chen et al [2015] is a character-based method and that of Cai and Zhao [2016] is a word-based method. The method of Cai et al [2017] is a new improved version of that by Cai and Zhao [2016]. The three methods do not use any unlabeled or partially-labeled data except source domain labeled data S l .…”
Section: Resultsmentioning
confidence: 99%
“…These tags may indicate the position of a character in the word [Xue, 2003] or represent the intervals between characters [Huang et al, 2007]. Recently, along with the development of deep learning methods, some neural network models [Chen et al, 2015;Cai and Zhao, 2016;Liu et al, 2016;Cai et al, 2017] have achieved great success in CWS tasks. Despite their enormous success, however, these methods still have limitations: they usually rely heavily on manually labeled training data.…”
Section: Introductionmentioning
confidence: 99%