Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.518
|View full text |Cite
|
Sign up to set email alerts
|

Entity Enhanced BERT Pre-training for Chinese NER

Abstract: Character-level BERT pre-trained in Chinese suffers a limitation of lacking lexicon information, which shows effectiveness for Chinese NER. To integrate the lexicon into pre-trained LMs for Chinese NER, we investigate a semisupervised entity enhanced BERT pre-training method. In particular, we first extract an entity lexicon from the relevant raw text using a newword discovery method. We then integrate the entity information into BERT using Char-Entity-Transformer, which augments the selfattention using a comb… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 42 publications
(24 citation statements)
references
References 25 publications
0
24
0
Order By: Relevance
“…ERNIE (Sun et al, 2019a,b) exploited entity-level and word-level masking to integrate knowledge into BERT in an implicit way. Jia et al (2020) proposed Entity Enhanced BERT, further pre-training BERT using a domainspecific corpus and entity set with a carefully designed character-entity Transformer. ZEN (Diao et al, 2020) enhanced Chinese BERT with a multi-layered N-gram encoder but is limited by the small size of the N-gram vocabulary.…”
Section: Related Workmentioning
confidence: 99%
“…ERNIE (Sun et al, 2019a,b) exploited entity-level and word-level masking to integrate knowledge into BERT in an implicit way. Jia et al (2020) proposed Entity Enhanced BERT, further pre-training BERT using a domainspecific corpus and entity set with a carefully designed character-entity Transformer. ZEN (Diao et al, 2020) enhanced Chinese BERT with a multi-layered N-gram encoder but is limited by the small size of the N-gram vocabulary.…”
Section: Related Workmentioning
confidence: 99%
“…Besides the standard benchmark dataset like CoNLL2003 and Ontonotes 4.0, we also choose some datasets closer to realworld application to verify the actual utility of our methods, such as Twitter NER and Weibo in social media domain. We use the same dataset prepro-cessing and split as in previous work (Huang et al, 2015;Mengge et al, 2020;Jia et al, 2020;Nguyen et al, 2020).…”
Section: Datasetmentioning
confidence: 99%
“…Previous methods commonly incorporated gazetteers as additional features (Ghaddar and Langlais, 2018;Al-Olimat et al, 2018;Liu et al, 2019a;Lin et al, 2019;Rijhwani et al, 2020). For languages without explicit word boundaries, such as Chinese, incorporating a universal dictionary with common words besides gazetteers can be further helpful for NER Liu et al, 2019b;Sui et al, 2019;Gui et al, 2019b,a;Ma et al, 2020;Li et al, 2020a;Jia et al, 2020). Dictionaries can also be used to construct distantly supervised data from unlabeled corpora.…”
Section: Enhancing Ner With External Datamentioning
confidence: 99%
“…These two-stage methods allow using large-scale unlabeled data in pre-training and small labeled data in fine-tuning. In order to adapt to specific tasks or domain, variants of BERT are proposed including small and practical BERT (Tsai et al, 2019;Lan et al, 2020b;Jiao et al, 2020), domain adaptive BERT (Yang et al, 2019a;Gururangan et al, 2020), and task adaptive BERT Xue et al, 2020;Jia et al, 2020). Our work performs further pre-training on BERT and proposes task-aware training objectives to improve NER.…”
Section: Two-stage Training Paradigm For Nlpmentioning
confidence: 99%