2021
DOI: 10.1109/taslp.2021.3124365
|View full text |Cite
|
Sign up to set email alerts
|

Pre-Training With Whole Word Masking for Chinese BERT

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
321
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 909 publications
(324 citation statements)
references
References 16 publications
2
321
0
1
Order By: Relevance
“…For the textual feature extraction, the Chinese BERT with whole word masking [36,37] is used, and the max length of text is set to 160. For efficient training, the feature-based approach is adopted on the pretrained language model, which means that the parameters of the pretrained language model are fixed.…”
Section: Settingsmentioning
confidence: 99%
“…For the textual feature extraction, the Chinese BERT with whole word masking [36,37] is used, and the max length of text is set to 160. For efficient training, the feature-based approach is adopted on the pretrained language model, which means that the parameters of the pretrained language model are fixed.…”
Section: Settingsmentioning
confidence: 99%
“…Further, a great deal of work has been done exploring the language-agnostic properties of BERT including the ability to perform well on Arabic [14] and Chinese [15] texts. Other transformer models with similar attention-based architectures show further improvements.…”
Section: A Deep Learning In Nlpmentioning
confidence: 99%
“…To sufficiently compare last NLP Transformer models with other traditional NLP deep learning models we use Bidirectional Encoder Representations from Transformers(BERT) [8] as the representative due to its widespread popularity and bidirectional properties. Moreover, BERT has been shown to perform well at extracting information not only from English text but also languages with very different structure including Arabic [14] and Chinese [15]. These language and structure agnostic properties of BERT make it an attractive choice for applications outside of written language.…”
Section: Bertmentioning
confidence: 99%
“…Representative autoregressive language models are word2vec (Mikolov et al , 2013), Glove (Pennington et al , 2014), ELMO (Peters et al , 2018), GPT (Radford et al , 2018), GPT-2 (Radford et al , 2019) and XLNet (Yang et al , 2019), and they are more suitable for text generation task. Representative autoencoding language models are Bert (Devlin et al , 2018), Bert-wwm (Cui et al , 2019), RoBERTa (Liu et al , 2019), ALBERT (Lan et al , 2019), ERNIE (Sun et al , 2019a), ERNIE-2 (Sun et al , 2019b) and ELECTRA (Clark et al , 2020), and they are more suitable for entity and relation extraction.…”
Section: Related Workmentioning
confidence: 99%