Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2022
DOI: 10.18653/v1/2022.naacl-main.139
|View full text |Cite
|
Sign up to set email alerts
|

Bridging the Gap between Language Models and Cross-Lingual Sequence Labeling

Abstract: Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks (xSL), such as cross-lingual machine reading comprehension (xMRC) by transferring knowledge from a high-resource language to low-resource languages. Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and finetuning stages: e.g., mask language modeling objective requires local understanding of the masked token and the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 31 publications
0
3
0
Order By: Relevance
“…Furthermore, we collect Orca in few-shot settings, challenging models to learn unseen domains with few samples. Although one can question previous full-data CMRC datasets also could support few-shot training for CMRC models, we argue that it could lead to unclear comparisons due to the inconsistent settings of different works on these datasets (Chen et al, 2022b). In contrast, we present a single standard benchmark for thorough comparisons.…”
Section: B Related Workmentioning
confidence: 94%
“…Furthermore, we collect Orca in few-shot settings, challenging models to learn unseen domains with few samples. Although one can question previous full-data CMRC datasets also could support few-shot training for CMRC models, we argue that it could lead to unclear comparisons due to the inconsistent settings of different works on these datasets (Chen et al, 2022b). In contrast, we present a single standard benchmark for thorough comparisons.…”
Section: B Related Workmentioning
confidence: 94%
“…Based on these challenging datasets, a great number of end-to-end approaches have been proposed, including BiDAF (Seo et al, 2016), DCN (Xiong et al, 2016), R-Net (Wang et al, 2017). In MRC tasks, attention mechanism (Dong et al, 2020a;Gao et al, 2020;Zhu et al, 2020;Chen et al, 2022) have become an essential part to capture dependencies without considering their distance in the input/output sequences. Recently, some works show that well pre-trained models are powerful and convenient for downstream tasks, such as R-Trans (Liu et al, 2019a), DCMN+ , ALBERT (Lan et al, 2020) and GF-Net (Lee and Kim, 2020), which facilitate us to take pre-trained models as our backbone encoder.…”
Section: Related Workmentioning
confidence: 99%
“…Much of this popularity can be attributed to the release of many annotated and publicly available datasets (Rajpurkar et al, 2016;Trischler et al, 2016;Chen et al, 2022a;You et al, 2022;Chen et al, 2023a). Formally, these MRC efforts can be classified into two most popular streams 1 from the answer type perspective: span-extraction (Rajpurkar et al, 2016;Trischler et al, 2016;Cui et al, 2019;Chen et al, 2022b;You et al, 2021a) and multiple choices (Lai et al, 2017;Zellers et al, 2018;Wang et al, 2020). The former requires the model to locate the text span in the given passage as the answer, e.g., SQuAD (Rajpurkar et al, 2016) and NewQA (Trischler et al, 2016).…”
Section: Introductionmentioning
confidence: 99%