Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021
DOI: 10.18653/v1/2021.findings-acl.119
|View full text |Cite
|
Sign up to set email alerts
|

LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization

Abstract: Language model pre-training based on large corpora has achieved tremendous success in terms of constructing enriched contextual representations and has led to significant performance gains on a diverse range of Natural Language Understanding (NLU) tasks. Despite the success, most current pre-trained language models, such as BERT, are trained based on single-grained tokenization, usually with fine-grained characters or sub-words, making it hard for them to learn the precise meaning of coarse-grained words and p… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 15 publications
0
8
0
Order By: Relevance
“…On the contrary, we will get this information more easily if we look at them as a whole. The potential of such natural lexical units has been demonstrated in natural language models [12,46]. Similarly, as shown in Figure 1(c) and Figure 1(e), compared with image patches, salient visual regions can reflect richer semantic information.…”
Section: Introductionmentioning
confidence: 67%
See 1 more Smart Citation
“…On the contrary, we will get this information more easily if we look at them as a whole. The potential of such natural lexical units has been demonstrated in natural language models [12,46]. Similarly, as shown in Figure 1(c) and Figure 1(e), compared with image patches, salient visual regions can reflect richer semantic information.…”
Section: Introductionmentioning
confidence: 67%
“…However, existing layout-aware multimodal Transformers mainly focus on fine-grained information such as words and image patches. They ignore coarse-grained information including natural lexical units [12,46], like multi-word expressions and phrases, and salient visual regions [1,21], like attractive or prominent image regions. These coarse-grained elements contain high-density information and consistent semantics, which are valuable for document understanding.…”
Section: Introductionmentioning
confidence: 99%
“…The first category uses word information in the pretraining stage but represents a text as a sequence of characters when the pretrained model is applied to downstream tasks (Cui et al, 2019a;Lai et al, 2021). The second category uses word information when the pretrained model is used in downstream tasks (Su, 2020;Guo et al, 2021). In this paper, MarKBERT incorporate the boundary information in the training process latently.…”
Section: Related Workmentioning
confidence: 99%
“…-LICHEE (Guo et al, 2021): a multi-granularity Chinese pre-trained model that incorporates word and character representations at the embedding level.…”
Section: Experiments On Language Understanding Taskmentioning
confidence: 99%
See 1 more Smart Citation