Proceedings of the 2019 Conference of the North 2019
DOI: 10.18653/v1/n19-1247
|View full text |Cite
|
Sign up to set email alerts
|

An Encoding Strategy Based Word-Character

Abstract: A recently proposed lattice model has demonstrated that words in character sequence can provide rich word boundary information for character-based Chinese NER model. In this model, word information is integrated into a shortcut path between the start and the end characters of the word. However, the existence of shortcut path may cause the model to degenerate into a partial word-based model, which will suffer from word segmentation errors. Furthermore, the lattice model can not be trained in batches due to its … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
57
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 83 publications
(58 citation statements)
references
References 30 publications
1
57
0
Order By: Relevance
“…Overall, in both R and P settings, ZEN outperforms BERT in all seven tasks, which clearly indicates the advantage of introducing n-grams into the encoding of character sequences. 13 This observation is similar to that from Dos Santos and Gatti (2014); Lample et al (2016); Bojanowski et al (2017); Liu et al (2019a). In detail, when compare R and P settings, 12 Most of the previous studies show their performance on the development set of the aforementioned tasks and we follow them to do so in order to provide a reference and comparison.…”
Section: Overall Performancesupporting
confidence: 66%
“…Overall, in both R and P settings, ZEN outperforms BERT in all seven tasks, which clearly indicates the advantage of introducing n-grams into the encoding of character sequences. 13 This observation is similar to that from Dos Santos and Gatti (2014); Lample et al (2016); Bojanowski et al (2017); Liu et al (2019a). In detail, when compare R and P settings, 12 Most of the previous studies show their performance on the development set of the aforementioned tasks and we follow them to do so in order to provide a reference and comparison.…”
Section: Overall Performancesupporting
confidence: 66%
“…A drawback of the purely character-based NER model is that the word information is not fully exploited. To incorporate word information in Chinese NER, some recent methods, such as [10,11,12,13,14], resort to an automatically constructed lexicon.…”
Section: Chinese Nermentioning
confidence: 99%
“…Chinese word segmentation was performed first before applying character sequence labeling (Guo et al, 2004;Mao et al, 2008;Zhu and Wang, 2019). The pre-processing segmentation features included character positional embedding (Peng and Dredze, 2015;He and Sun, 2017a,b), segmentation tags Zhu and Wang, 2019), word embedding (Peng and Dredze, 2015;Liu et al, 2019;E and Xiang, 2017) and so on. The other was to train NER and CWS tasks jointly to incorporate task-shared word boundary information from the CWS into the NER (Xu et al, 2013;Peng and Dredze, 2016;Cao et al, 2018).…”
Section: Related Workmentioning
confidence: 99%
“…However, they treated the segmentations equally without error discrimination. Liu et al (2019) introduced four naive selection strategies to select words from the pre-prepared Lexicon for their model. However, these strategies did not consider the context of a sentence.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation