Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2022
DOI: 10.18653/v1/2022.naacl-main.380
|View full text |Cite
|
Sign up to set email alerts
|

On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model

Abstract: Many recent studies on large-scale language models have reported successful in-context zero-and few-shot learning ability. However, the in-depth analysis of when in-context learning occurs is still lacking. For example, it is unknown how in-context learning performance changes as the training corpus varies. Here, we investigate the effects of the source and size of the pretraining corpus on in-context learning in HyperCLOVA, a Korean-centric GPT-3 model. From our in-depth investigation, we introduce the follow… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(6 citation statements)
references
References 7 publications
(28 reference statements)
0
6
0
Order By: Relevance
“…Pre-training Stage We first introduce influence factors in the LLM pretraining stage. Shin et al (2022a) investigated the influence of the pretraining corpora. They found that the domain source is more important than the corpus size.…”
Section: What Influences Icl Performancementioning
confidence: 99%
See 2 more Smart Citations
“…Pre-training Stage We first introduce influence factors in the LLM pretraining stage. Shin et al (2022a) investigated the influence of the pretraining corpora. They found that the domain source is more important than the corpus size.…”
Section: What Influences Icl Performancementioning
confidence: 99%
“…Pretraining corpus domain (Shin et al, 2022a) Pretraining corpus combination (Shin et al, 2022a) Number of model parameters (Wei et al, 2022b;Brown et al, 2020) Number of pretraining steps (Wei et al, 2022b) Inference Label space exposure (Min et al, 2022c) Demonstration input distribution (Min et al, 2022c) Format of input-label pairing (Min et al, 2022c) Demonstration input-label mapping (Min et al, 2022c; Demonstration sample ordering (Lu et al, 2022) Demonstration-query similarity Table 3: Summary of factors that have a relatively strong correlation to ICL performance. ios.…”
Section: Pretrainingmentioning
confidence: 99%
See 1 more Smart Citation
“…However, as in-context learning does not require updating PLM parameters, there arises the problem of distribution mismatch between the data used for LM pre-training and the test samples used in in-context learning, which hinders the full exploitation of the knowledge encoded in PLMs Ge et al, 2022;Shin et al, 2022). To alleviate the context shift, existing methods rely on prior knowledge to increase the overlapping between the two distributions.…”
Section: Testmentioning
confidence: 99%
“…Large Language Models (LLMs) have succeeded in advancing the state-of-the-arts for many Natural Language Processing (NLP) tasks [Devlin et al, 2019, Brown et al, 2020, Rae et al, 2021, Thoppilan et al, 2022, Chowdhery et al, 2022, Scao et al, 2022, Zhang et al, 2022b, Bai et al, 2022, Touvron et al, 2023, benefiting from the ultra-large-scale training corpora and computation resources. To unleash the LLMs' power of adaptation on unseen tasks without any parameter updates, in-context learning (ICL) has become one of the flourishing research topics, aiming at generating the prediction by conditioning on a few labeled exemplars (Figure 1 (a)) [Dong et al, 2023, Zhao et al, 2021, Shin et al, 2022, Lu et al, 2022.…”
Section: Introductionmentioning
confidence: 99%