Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.274
|View full text |Cite
|
Sign up to set email alerts
|

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

Abstract: GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce Hy-perCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens. Enhanced by our Korean-specific toke… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
38
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 43 publications
(38 citation statements)
references
References 8 publications
0
38
0
Order By: Relevance
“…Pre-trained extreme-scale language models (e.g., GPT-3 (175B) [6], HyperCLOVA (204B) [8], and Megatron Turing NLG (530B) [10]) are usually not publicly available. Thus, in this work, our detailed analysis of group-wise quantization and nuQmm is limited to relatively smaller models (such as GPT Neo).…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Pre-trained extreme-scale language models (e.g., GPT-3 (175B) [6], HyperCLOVA (204B) [8], and Megatron Turing NLG (530B) [10]) are usually not publicly available. Thus, in this work, our detailed analysis of group-wise quantization and nuQmm is limited to relatively smaller models (such as GPT Neo).…”
Section: Discussionmentioning
confidence: 99%
“…In addition to the architectural advantages of the Transformer to scale the model size, generative LMs are increasing the number of parameters (as depicted in Table I) because: 1) self-supervised learning alleviates the burden of the expensive labeling process and 2) scaling-law [6], [7] provides a predictable performance on the cross-entropy loss as the model size increases. Surprising qualitative evaluation results (e.g., human-like writing) of extreme-scale LMs also fueled the competition in model size [8], [9].…”
Section: A Generative Language Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…various large-scale in-context language models have been proposed (Black et al, 2021;Kim et al, 2021;Zeng et al, 2021;Rae et al, 2021;Hoffmann et al, 2022;Chowdhery et al, 2022).…”
Section: Introductionmentioning
confidence: 99%
“…In addition, analysis on the relation between the validation perplexity of a language model and incontext learning performance is still less investigated. Previous research on in-context learning implicitly assumes that perplexity is predictive of incontext learning performance by showing scaling law property of their model (Kaplan et al, 2020;Brown et al, 2020;Kim et al, 2021). Rae et al (2021) also use perplexity for the hyperparameter selection on corpus reweighting in the pretraining of their in-context learner.…”
Section: Introductionmentioning
confidence: 99%