Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing 2022
DOI: 10.18653/v1/2022.emnlp-main.91
|View full text |Cite
|
Sign up to set email alerts
|

Numerical Optimizations for Weighted Low-rank Estimation on Language Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
55
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 24 publications
(55 citation statements)
references
References 0 publications
0
55
0
Order By: Relevance
“…7. Training by reward network: The successful application of Human-feedback Reinforcement Learning (HFRL) to the training process for LLMs has been established [93], and we borrow ideas from [94,95]. We consider cell types as the human label for different cells, akin to the labels of sentences in the NLP area.…”
Section: B Initial Settingsmentioning
confidence: 99%
“…7. Training by reward network: The successful application of Human-feedback Reinforcement Learning (HFRL) to the training process for LLMs has been established [93], and we borrow ideas from [94,95]. We consider cell types as the human label for different cells, akin to the labels of sentences in the NLP area.…”
Section: B Initial Settingsmentioning
confidence: 99%
“…By making adjustments to a limited set of parameters, these techniques avoid (potentially costly) modifications to the much larger backend architecture. There are two primary methods for PE tuning: (i) Training a subset of model parameters, usually done by placing a linear probe on top of pretrained features [37], and (ii) integrating small modules within the network [28,39,15,21,17].…”
Section: Related Workmentioning
confidence: 99%
“…We evaluated several modules for the modular-update method, including Adapter [16], LoRA [17], and VPT [21]. Due to space limitations, we only include the results of the Adapter method in the main paper, while the results of the LoRA and VPT methods are similar and relegated to the supplementary material B.3.…”
Section: Modulesmentioning
confidence: 99%
“…This shows its greatest downside -AdapterFusion is trained for one task only. Hu et al (2021) argue that the original adapter bottleneck design (Houlsby et al, 2019) introduces inference latency because the adapters are processed sequentially, whereas large language models (LLMs) rely on hardware parallelism. Their approach, LoRA (Low Rank Approximation) modifies attention weights of query and value projection matrices by introducing trainable low-rank decomposition matrices in parallel to the original computation.…”
Section: Adaptersmentioning
confidence: 99%
“…Finally, model updating due to distribution shift, new data, or business requirements (see Section 4.4) seems most plausible in the setting where prompts are continuous and tuned (Li & Liang, 2021). However, this has downsides, such as difficult optimization, non-monotonic performance change with regard to the number of parameters, and reserving a part of sequence length for adaptation (Hu et al, 2021).…”
Section: B Additional Multi-task Learning Approachesmentioning
confidence: 99%