2023
DOI: 10.48550/arxiv.2303.10893
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…By predicting masked words, BERT learns rich linguistic features. Whole Word Masking BERT is a variant of BERT that masks entire words during the pre-training phase, instead of individual letters or characters [91]. This method better handles semantic information in language, especially for languages with complex structures like Chinese.…”
Section: Baselinementioning
confidence: 99%
“…By predicting masked words, BERT learns rich linguistic features. Whole Word Masking BERT is a variant of BERT that masks entire words during the pre-training phase, instead of individual letters or characters [91]. This method better handles semantic information in language, especially for languages with complex structures like Chinese.…”
Section: Baselinementioning
confidence: 99%