2023
DOI: 10.48550/arxiv.2303.10464
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models

Abstract: The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural Language Processing (NLP). Instead of directly training on a downstream task, language models are first pre-trained on large datasets with cross-domain knowledge (e.g., Pile, MassiveText, etc.) and then fine-tuned on task-specific data (e.g., natural language generation, text summarization, etc.). Scaling the model and dataset size has helped improve the performance of LLMs, but unfortunately, this also leads to h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 21 publications
0
0
0
Order By: Relevance
“…Subsequently, a simple and more cost-effective fine-tuning process is sufficient for each specific task. This is possible because the dataset size required for fine-tuning is considerably smaller, reducing time and resource requirements [51].…”
Section: Fine-tuning Of Large Language Models (Llms)mentioning
confidence: 99%
“…Subsequently, a simple and more cost-effective fine-tuning process is sufficient for each specific task. This is possible because the dataset size required for fine-tuning is considerably smaller, reducing time and resource requirements [51].…”
Section: Fine-tuning Of Large Language Models (Llms)mentioning
confidence: 99%