Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017
DOI: 10.18653/v1/d17-1072
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Grained Chinese Word Segmentation

Abstract: Traditionally, word segmentation (WS) adopts the single-granularity formalism, where a sentence corresponds to a single word sequence. However, Sproat et al. (1996) show that the inter-nativespeaker consistency ratio over Chinese word boundaries is only 76%, indicating single-grained WS (SWS) imposes unnecessary challenges on both manual annotation and statistical modeling. Moreover, WS results of different granularities can be complementary and beneficial for high-level applications.This work proposes and add… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
31
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 11 publications
(31 citation statements)
references
References 15 publications
0
31
0
Order By: Relevance
“…Given an input sentence, the task of MWS is to retrieve all words of different granularities, which can be naturally organized as a hierarchical tree structure as shown in Figure 1 (right). Gong et al (2017) propose several MWS approaches and show that treating MWS as constituent parsing leads to the best performance. They adopt the transition-based parser of Cross and Huang (2016), which greedily searches an optimal shift-reduce action sequence to build a tree.…”
Section: Graph-based Model With Local Lossmentioning
confidence: 99%
See 3 more Smart Citations
“…Given an input sentence, the task of MWS is to retrieve all words of different granularities, which can be naturally organized as a hierarchical tree structure as shown in Figure 1 (right). Gong et al (2017) propose several MWS approaches and show that treating MWS as constituent parsing leads to the best performance. They adopt the transition-based parser of Cross and Huang (2016), which greedily searches an optimal shift-reduce action sequence to build a tree.…”
Section: Graph-based Model With Local Lossmentioning
confidence: 99%
“…They adopt the transition-based parser of Cross and Huang (2016), which greedily searches an optimal shift-reduce action sequence to build a tree. In this work, instead of adopting the transition-based parser as Gong et al (2017), we employ the graph-based parser of Stern et al (2017) and replace the original global max-margin loss with local span-wise loss (Joshi et al, 2018;Teng and Zhang, 2018) as our basic MWS model due to two considerations: 1) the graph-based parser with local loss gains more efficiency without hurting the performance compared with the transitionbased parser and the graph-based parser with global loss, which will be discussed in Section 5.3; 2) Figure 2: Architecture of our MWS model. more importantly, this work aims to conduct in-depth study on a simple, efficient, and effective way to incorporate weakly labeled data for MWS.…”
Section: Graph-based Model With Local Lossmentioning
confidence: 99%
See 2 more Smart Citations
“…The changing of F1-score as more source domains are introduced in three different orders: Max-, Min-, and Randselect. The red dotted line is the result reported byChen et al (2017) with the same model, trained on nine datasets. 1…”
mentioning
confidence: 97%