Kronecker Decomposition for GPT Compression

Edalati, Ali; Tahaei, Marzieh S.; Ahmad, Robiah; Nia, Vahid Partovi; Clark, James J.; Rezagholizadeh, Mehdi

doi:10.18653/v1/2022.acl-short.24

Cited by 10 publications

(7 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…DRONE achieves better performance than SVD. Besides, as an alternative to SVD, Kronecker decomposition retains the rank of the matrix and has shown improvement compressing BERT and GPT-2 (Tahaei et al 2021;Edalati et al 2022).…”

Section: Low-rank Factorizationmentioning

confidence: 99%

A Survey on Model Compression and Acceleration for Pretrained Language Models

McAuley

2023

AAAI

View full text Add to dashboard Cite

Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long inference delay prevent Transformer-based pretrained language models (PLMs) from seeing broader adoption including for edge and mobile computing. Efficient NLP research aims to comprehensively consider computation, time and carbon emission for the entire life-cycle of NLP, including data preparation, model training and inference. In this survey, we focus on the inference stage and review the current state of model compression and acceleration for pretrained language models, including benchmarks, metrics and methodology.

show abstract

Section: Low-rank Factorizationmentioning

confidence: 99%

A Survey on Model Compression and Acceleration for Pretrained Language Models

McAuley

2023

AAAI

View full text Add to dashboard Cite

show abstract

“…As mentioned in the introduction, fewer works in this category have been proposed compared to Knowledge Distillation on encoders. KnGPT2 [33] compresses the embedding and Transformer layers of GPT-2 using Kronecker decomposition. It uses KD to compensate for the performance drop of the compressed model.…”

Section: Knowledge Distillation On Transformermentioning

confidence: 99%

PET: Parameter-efficient Knowledge Distillation on Transformer

et al. 2023

View full text Add to dashboard Cite

Given a large Transformer model, how can we obtain a small and computationally efficient model which maintains the performance of the original model? Transformer has shown significant performance improvements for many NLP tasks in recent years. However, their large size, expensive computational cost, and long inference time make it challenging to deploy them to resource-constrained devices. Existing Transformer compression methods mainly focus on reducing the size of the encoder ignoring the fact that the decoder takes the major portion of the long inference time. In this paper, we propose PET (Parameter-Efficient knowledge distillation on Transformer), an efficient Transformer compression method that reduces the size of both the encoder and decoder. In PET, we identify and exploit pairs of parameter groups for efficient weight sharing, and employ a warm-up process using a simplified task to increase the gain through Knowledge Distillation. Extensive experiments on five real-world datasets show that PET outperforms existing methods in machine translation tasks. Specifically, on the IWSLT’14 EN→DE task, PET reduces the memory usage by 81.20% and accelerates the inference speed by 45.15% compared to the uncompressed model, with a minor decrease in BLEU score of 0.27.

show abstract

“…GPT's victory can largely be attributed to its extensive pre-formation of massive amounts of data and its high characteristics (ranging from 100 million to billions). Although GPT has improved performance (particularly in very few zero-shot setups), its over parameterized character makes it difficult to deploy on systems with low computing capabilities or storage [20].…”

Section: Gptmentioning

confidence: 99%

Evaluating Neural Dialogue Systems Using Deep Learning and燙onversation燞istory

AlMutairi¹,

Qamar²

2022

Journal on Artificial Intelligence

View full text Add to dashboard Cite

Neural talk models play a leading role in the growing popular building of conversational managers. A commonplace criticism of those systems is that they seldom understand or use the conversation data efficiently. The development of profound concentration on innovations has increased the use of neural models for a discussion display. In recent years, deep learning (DL) models have achieved significant success in various tasks, and many dialogue systems are also employing DL techniques. The primary issues involved in the generation of the dialogue system are acquiring perspectives into instinctual linguistics, comprehension provision, and conversation assessment. In this paper, we mainly focus on DL-based dialogue systems. The issue to be overcome under this publication would be dialogue supervision, which will determine how the framework responds to recognizing the needs of the user. The dataset utilized in this research is extracted from movies. The models implemented in this research are the seq2seq model, transformers, and GPT while using word embedding and NLP. The results obtained after implementation depicted that all three models produced accurate results. In the modern revolutionized world, the demand for a dialogue system is more than ever. Therefore, it is essential to take the necessary steps to build effective dialogue systems.

show abstract

Kronecker Decomposition for GPT Compression

Cited by 10 publications

References 29 publications

A Survey on Model Compression and Acceleration for Pretrained Language Models

A Survey on Model Compression and Acceleration for Pretrained Language Models

PET: Parameter-efficient Knowledge Distillation on Transformer

Evaluating Neural Dialogue Systems Using Deep Learning and燙onversation燞istory

Contact Info

Product

Resources

About