2021
DOI: 10.48550/arxiv.2105.07148
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

Abstract: Lexicon information and pre-trained models, such as BERT, have been combined to explore Chinese sequence labeling tasks due to their respective strengths. However, existing methods solely fuse lexicon features via a shallow and random initialized sequence layer and do not integrate them into the bottom layers of BERT. In this paper, we propose Lexicon Enhanced BERT (LEBERT) for Chinese sequence labeling, which integrates external lexicon knowledge into BERT layers directly by a Lexicon Adapter layer. Compared … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(23 citation statements)
references
References 37 publications
0
15
0
Order By: Relevance
“…For the public datasets, the original test and validation sets are preserved and used in the respective evaluation and validation stages. From the analysis of the experimental results, it is evident that our proposed model exhibits superior performance in comparison to traditional models, including BiLSTM-CRF, Lattice LSTM [20] , Lexicon [21] , BERT, and BERT-CRF, particularly in low-resource scenarios. Across all four datasets, our model consistently outperforms the widely-utilized BERT-CRF model, achieving an average increase of 3% in terms of the F1 score.…”
Section: Methodsmentioning
confidence: 98%
“…For the public datasets, the original test and validation sets are preserved and used in the respective evaluation and validation stages. From the analysis of the experimental results, it is evident that our proposed model exhibits superior performance in comparison to traditional models, including BiLSTM-CRF, Lattice LSTM [20] , Lexicon [21] , BERT, and BERT-CRF, particularly in low-resource scenarios. Across all four datasets, our model consistently outperforms the widely-utilized BERT-CRF model, achieving an average increase of 3% in terms of the F1 score.…”
Section: Methodsmentioning
confidence: 98%
“…Lattice LSTM (Zhang and Yang, 2018) 89.37 90.84 90.10 BERT-CRF (Devlin et al, 2018) 88.46 92.35 90.37 ERNIE (Zhang et al, 2019) 88.87 92.27 90.53 FLAT (Li et al, 2020) 88.76 92.07 90.38 LEBERT (Liu et al, 2021) 86.53 92.91 89.60…”
Section: Dialoamc As a New Benchmarkmentioning
confidence: 99%
“…Experimental settings We use several popular Chinese named entity models as baselines, including: 1) Lattice LSTM (Zhang and Yang, 2018), an extension of Char-LSTM that incorporates lexical information into native LSTM; 2) BERT (Devlin et al, 2018), a bidirectional Transformer encoder with large-scale language pre-training; 3) ERNIE (Zhang et al, 2019); an improved BERT that adopts entity-level masking and phraselevel masking during pre-training; 4) FLAT (Li et al, 2020), a flat-lattice Transformer that converts the lattice structure into a flat structure consisting of spans; 5) LEBERT (Liu et al, 2021), a lexicon enhanced BERT for Chinese sequence labelling, which integrates external lexicon knowledge into BERT layers by a lexicon adapter layer. We train each model for 10 epochs, using the default parameters of the corresponding code repository.…”
Section: Named Entity Recognitionmentioning
confidence: 99%
“…Consider that the emission score actually reflects the capability of the prepositive encoder, which is, the encoder can not perfectly generalize its ability to unseen words (OOV); thus, the emission score can be biased. In other words, as clearly analyzed in Wei et al (2021), the author states that without a hard mechanism to enforce the transition rule, the conventional CRF can result in the occasional occurrence of illegal predictions, which indicates that it is possible to lead to wrong tag paths under the current decode framework.…”
Section: Pcrf Inference Layermentioning
confidence: 99%
“…Ke et al (2020) present a CWS-specific pre-trained model, which employs a unified architecture to make use of segmentation knowledge of different criteria. Liu et al (2021) raise a lexicon enhanced BERT, which combines the character and lexicon features as the input. Besides, it attaches a lexicon adapter between the Transformer layers to integrate lexicon knowledge into BERT.…”
Section: Introductionmentioning
confidence: 99%