2021
DOI: 10.48550/arxiv.2105.13880
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Knowledge Inheritance for Pre-trained Language Models

Abstract: Recent explorations of large-scale pre-trained language models (PLMs) such as GPT-3 have revealed the power of PLMs with huge amounts of parameters, setting off a wave of training ever-larger PLMs. However, training a large-scale PLM requires tremendous amounts of computational resources, which is timeconsuming and expensive. In addition, existing large-scale PLMs are mainly trained from scratch individually, ignoring the availability of many existing welltrained PLMs. To this end, we explore the question that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 52 publications
(83 reference statements)
0
8
0
Order By: Relevance
“…Regarding similar works, current research in knowledge distillation has primarily focused on transferring knowledge in the domain of language models [13,14] and image classification tasks [10,[15][16][17]. Yet there have been fewer works in other fields, such as object detection [52] and segmentation [53], domain generalization [54], and video classification.…”
Section: How Our Work Fits In the Literaturementioning
confidence: 99%
See 1 more Smart Citation
“…Regarding similar works, current research in knowledge distillation has primarily focused on transferring knowledge in the domain of language models [13,14] and image classification tasks [10,[15][16][17]. Yet there have been fewer works in other fields, such as object detection [52] and segmentation [53], domain generalization [54], and video classification.…”
Section: How Our Work Fits In the Literaturementioning
confidence: 99%
“…Knowledge distillation [10][11][12] is a widely used technique for creating a smaller version of a pretrained model that meets specific application needs, and it has recently been explored as a knowledge transfer technique. Current research primarily focuses on transferring knowledge in the domain of language models [13,14] and image classification tasks [10,[15][16][17]. Nevertheless, as far as we know, the potential of knowledge distillation as a means of knowledge transfer, rather than just compression, has not been thoroughly investigated in video-based human action recognition, which is where our contributions mainly lie.…”
Section: Introductionmentioning
confidence: 99%
“…The more evident performance gain is observed, especially under low-resource settings. (b) Besides conducting pre-finetuning on supervised small-scale datasets, another line of work conducts pre-finetuning on domain-specific unlabeled data and shows that additional adaptation towards a certain domain could provide significant benefits [77,527,528]. (2) Understanding the success of pre-finetuning.…”
Section: Multi-task Learningmentioning
confidence: 99%
“…The dynamically enlarging model size has been used to pre-train big models [570], and achieved promising results. Moreover, Qin et al [527] further propose "knowledge inheritance" to continually absorb knowledge from existing trained big models to learn larger and better big models.…”
Section: Continual Learningmentioning
confidence: 99%
“…Knowledge inheritance (Qin et al, 2021) is related to our knowledge integration. Knowledge inheritance usually inherits knowledge from small pretrained model and then speed-up the training of large models.…”
Section: Knowledge Integrationmentioning
confidence: 99%