Entity Enhanced BERT Pre-training for Chinese NER

Jia, Chen; Shi, Yuefeng; Yang, Qinrong; Zhang, Yue

doi:10.18653/v1/2020.emnlp-main.518

Cited by 42 publications

(24 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ERNIE (Sun et al, 2019a,b) exploited entity-level and word-level masking to integrate knowledge into BERT in an implicit way. Jia et al (2020) proposed Entity Enhanced BERT, further pre-training BERT using a domainspecific corpus and entity set with a carefully designed character-entity Transformer. ZEN (Diao et al, 2020) enhanced Chinese BERT with a multi-layered N-gram encoder but is limited by the small size of the N-gram vocabulary.…”

Section: Related Workmentioning

confidence: 99%

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

Liu¹,

Fu²,

Zhang³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

Self Cite

108

View full text Add to dashboard Cite

Lexicon information and pre-trained models, such as BERT, have been combined to explore Chinese sequence labeling tasks due to their respective strengths. However, existing methods solely fuse lexicon features via a shallow and random initialized sequence layer and do not integrate them into the bottom layers of BERT. In this paper, we propose Lexicon Enhanced BERT (LEBERT) for Chinese sequence labeling, which integrates external lexicon knowledge into BERT layers directly by a Lexicon Adapter layer. Compared with existing methods, our model facilitates deep lexicon knowledge fusion at the lower layers of BERT. Experiments on ten Chinese datasets of three tasks including Named Entity Recognition, Word Segmentation, and Part-of-Speech Tagging, show that LEBERT achieves state-ofthe-art results.

show abstract

Section: Related Workmentioning

confidence: 99%

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

Liu¹,

Fu²,

Zhang³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

Self Cite

108

View full text Add to dashboard Cite

show abstract

“…Besides the standard benchmark dataset like CoNLL2003 and Ontonotes 4.0, we also choose some datasets closer to realworld application to verify the actual utility of our methods, such as Twitter NER and Weibo in social media domain. We use the same dataset prepro-cessing and split as in previous work (Huang et al, 2015;Mengge et al, 2020;Jia et al, 2020;Nguyen et al, 2020).…”

Section: Datasetmentioning

confidence: 99%

Accelerating BERT Inference for Sequence Labeling via Early-Exit

Li¹,

Shao²,

Sun³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Both performance and efficiency are crucial factors for sequence labeling tasks in many real-world scenarios. Although the pre-trained models (PTMs) have significantly improved the performance of various sequence labeling tasks, their computational cost is expensive. To alleviate this problem, we extend the recent successful early-exit mechanism to accelerate the inference of PTMs for sequence labeling tasks. However, existing early-exit mechanisms are specifically designed for sequencelevel tasks, rather than sequence labeling. In this paper, we first propose SENTEE: a simple extension of SENTence-level Early-Exit for sequence labeling tasks. To further reduce computational cost, we also propose TOKEE: a TOKen-level Early-Exit mechanism that allows partial tokens to exit early at different layers. Considering the local dependency inherent in sequence labeling, we employed a window-based criterion to decide for a token whether or not to exit. The token-level earlyexit brings the gap between training and inference, so we introduce an extra self-sampling fine-tuning stage to alleviate it. The extensive experiments on three popular sequence labeling tasks show that our approach can save up to 66%∼75% inference cost with minimal performance degradation. Compared with competitive compressed models such as DistilBERT, our approach can achieve better performance under the same speed-up ratios of 2×, 3×, and 4×. 1

show abstract

“…Previous methods commonly incorporated gazetteers as additional features (Ghaddar and Langlais, 2018;Al-Olimat et al, 2018;Liu et al, 2019a;Lin et al, 2019;Rijhwani et al, 2020). For languages without explicit word boundaries, such as Chinese, incorporating a universal dictionary with common words besides gazetteers can be further helpful for NER Liu et al, 2019b;Sui et al, 2019;Gui et al, 2019b,a;Ma et al, 2020;Li et al, 2020a;Jia et al, 2020). Dictionaries can also be used to construct distantly supervised data from unlabeled corpora.…”

Section: Enhancing Ner With External Datamentioning

confidence: 99%

“…These two-stage methods allow using large-scale unlabeled data in pre-training and small labeled data in fine-tuning. In order to adapt to specific tasks or domain, variants of BERT are proposed including small and practical BERT (Tsai et al, 2019;Lan et al, 2020b;Jiao et al, 2020), domain adaptive BERT (Yang et al, 2019a;Gururangan et al, 2020), and task adaptive BERT Xue et al, 2020;Jia et al, 2020). Our work performs further pre-training on BERT and proposes task-aware training objectives to improve NER.…”

Section: Two-stage Training Paradigm For Nlpmentioning

confidence: 99%

Toward Fully Exploiting Heterogeneous Corpus:A Decoupled Named Entity Recognition Model with Two-stage Training

Hu¹,

Zhu²,

Zhang³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Named Entity Recognition (NER) is a fundamental and widely used task in natural language processing (NLP), which is generally trained on the human-annotated corpus. However, data annotation is costly and timeconsuming, which restricts its scale and further leads to the performance bottleneck of NER models. In reality, we can conveniently collect large-scale entity dictionaries and distantly supervised data. However, the collected dictionaries are lack of semantic context and the distantly supervised training instances contain large noise, which will bring uncertain effects to NER models when directly incorporated into the high-quality training set. To address the above issue, we propose a BERT-based decoupled NER model with two-stage training to appropriately take advantage of the heterogeneous corpus, including dictionaries, distantly supervised instances, and human-annotated instances. Our decoupled model consists of a Mention-BERT and a Context-BERT to respectively learn from the context-deficient dictionaries and noised distantly supervised instances at the pre-training stage. At the unifiedtraining stage, the two BERTs are trained together on human-annotated data to predict the correct labels for candidate regions. Empirical studies on three Chinese NER datasets demonstrate that our method achieves significant improvements against several baselines, establishing the new state-of-the-art performance.

show abstract

Entity Enhanced BERT Pre-training for Chinese NER

Cited by 42 publications

References 25 publications

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

Accelerating BERT Inference for Sequence Labeling via Early-Exit

Toward Fully Exploiting Heterogeneous Corpus:A Decoupled Named Entity Recognition Model with Two-stage Training

Contact Info

Product

Resources

About