Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.418
|View full text |Cite
|
Sign up to set email alerts
|

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

Abstract: This paper presents a novel pre-trained language models (PLM) compression approach based on the matrix product operator (short as MPO) from quantum many-body physics. It can decompose an original matrix into central tensors (containing the core information) and auxiliary tensors (with only a small proportion of parameters). With the decomposed MPO structure, we propose a novel fine-tuning strategy by only updating the parameters from the auxiliary tensors, and design an optimization algorithm for MPO-based app… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 25 publications
0
1
0
Order By: Relevance
“…An example of such clever data compression schemes based on TN and MPO decomposition has already been introduced in the previous subsection. Using MPO as an efficient representation for weight matrices of a NN was originally suggested in the ML community under the name tensor trains [44] and later reintroduced in other contexts for systematic compression of fully connected NN models [34,45], for solving partial differential equations with NNs [31] and for language models [46] and speech processing [47].…”
Section: B Tensorizing Standard Neural Networkmentioning
confidence: 99%
“…An example of such clever data compression schemes based on TN and MPO decomposition has already been introduced in the previous subsection. Using MPO as an efficient representation for weight matrices of a NN was originally suggested in the ML community under the name tensor trains [44] and later reintroduced in other contexts for systematic compression of fully connected NN models [34,45], for solving partial differential equations with NNs [31] and for language models [46] and speech processing [47].…”
Section: B Tensorizing Standard Neural Networkmentioning
confidence: 99%
“…tensor-train operators (Oseledets, 2011)) were proposed for a more effective representation of the linear structure of neural networks , which was used to compress deep neural networks (Novikov et al, 2015), convolutional neural networks (Garipov et al, 2016;Yu et al, 2017), and LSTM (Gao et al, 2020b;Sun et al, 2020a). Based on MPO decomposition, recent studies designed lightweight finetuning and compression methods for PLMs (Liu et al, 2021), developed parameter-efficient MoE architecture (Gao et al, 2022), over-parametrization PLMs and empirical study the emergency ability in quantized large language models . Unlike these works, our work aims to develop a very deep PLM with lightweight architecture and stable training.…”
Section: Related Workmentioning
confidence: 99%
“…Second, it should not affect the capacity to capture layer-specific variations. To achieve this, we utilize the MPO decomposition (Liu et al, 2021) to develop a parameterefficient architecture by sharing informative components across layers and keeping layer-specific supplementary components (Section 3.2). As another potential issue, it is difficult to optimize deep PLMs due to unstable training (Wang et al, 2022b), especially when weight sharing (Lan et al, 2019) is involved.…”
Section: Overview Of Our Approachmentioning
confidence: 99%
See 2 more Smart Citations