Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.734
|View full text |Cite
|
Sign up to set email alerts
|

Improving Chinese Word Segmentation with Wordhood Memory Networks

Abstract: Contextual features always play an important role in Chinese word segmentation (CWS). Wordhood information, being one of the contextual features, is proved to be useful in many conventional character-based segmenters. However, this feature receives less attention in recent neural models and it is also challenging to design a framework that can properly integrate wordhood information from different wordhood measures to existing neural frameworks. In this paper, we therefore propose a neural framework, WMSEG, wh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
58
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1

Relationship

3
6

Authors

Journals

citations
Cited by 82 publications
(58 citation statements)
references
References 31 publications
0
58
0
Order By: Relevance
“…Extra knowledge (e.g., pre-trained embeddings (Song et al, 2017;Song and Shi, 2018; and pretrained models (Devlin et al, 2019;Diao et al, 2019)) can provide useful information and thus enhance model performance for many NLP tasks (Tian et al, 2020a,b,c). Specifically, memory and memory-augmented neural networks (Zeng et al, 2018;Santoro et al, 2018;Diao et al, 2020;Tian et al, 2020d) are another line of related research, which can be traced back to , which proposed memory networks to leverage extra information for question answering; then Sukhbaatar et al (2015) improved it with an end-to-end design to ensure the model being trained with less supervision. Particularly for Transformer, there are also memory-based methods proposed.…”
Section: Base+rm+mclnmentioning
confidence: 99%
“…Extra knowledge (e.g., pre-trained embeddings (Song et al, 2017;Song and Shi, 2018; and pretrained models (Devlin et al, 2019;Diao et al, 2019)) can provide useful information and thus enhance model performance for many NLP tasks (Tian et al, 2020a,b,c). Specifically, memory and memory-augmented neural networks (Zeng et al, 2018;Santoro et al, 2018;Diao et al, 2020;Tian et al, 2020d) are another line of related research, which can be traced back to , which proposed memory networks to leverage extra information for question answering; then Sukhbaatar et al (2015) improved it with an end-to-end design to ensure the model being trained with less supervision. Particularly for Transformer, there are also memory-based methods proposed.…”
Section: Base+rm+mclnmentioning
confidence: 99%
“…Tian et al [4] showed that the extra pre-trained BERT language model as encoder concatenating to the Conditional Random Fields(CRF) can achieve 98.28% to 98.40% on the MSR2005 Chinese word segment test dataset.…”
Section: Related Workmentioning
confidence: 99%
“…Cui et al [3] fine-tuned BERT and XLNet with extra 4.5 billion Chinese tokens, and improved the performance of many downstream tasks. Tian et al [4] showed that the extra pre-trained BERT language model as encoder concatenating to the Conditional Random Fields(CRF) can achieve 98.28% to 98.40% on the MSR2005 Chinese word segment test dataset.…”
Section: Related Workmentioning
confidence: 99%
“…However, for joint CWS and POS tagging, previous approaches to leveraging the n-gram features are limited to directly concatenating the n-gram embeddings with the input character embedding, where unimportant n-grams may mislead the model and result in incorrect predictions. Therefore, assigning appropriate weights to different n-grams regarding to their contexts is a potential effective solution (Higashiyama et al, 2019;Tian et al, 2020b) to the joint task and we propose to use multi-channel attention to tackle this mission. In detail, we first categorize n-grams by a specific metric, which in this study is either their frequencies or lengths and then model the grouped n-grams in separate channels of attentions.…”
Section: The Multi-channel Attentionsmentioning
confidence: 99%