Offsite-Tuning: Transfer Learning without Full Model

Xiao, Guangxuan; Ji, Lin; Han, Song

doi:10.48550/arxiv.2302.04870

Cited by 6 publications

(11 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We run the same task of a method in the above-mentioned grid search space three times with different random seeds, choose the best result from each run, and report the mean and standard deviation of these best results. For all question-answering tasks, we sweep learning rates in {1, 3, 5, 7}•10 −4 , batch sizes in {8, 16, 32} and the number of epochs in {3, 5, 10}, and keep other settings the same, which is inspired by [48]. The sequence length for all tasks is set to 512, 128, 128 and 128 for BERT base , RoBERTa large , BART large and OPT as our baselines, respectively.…”

Section: Methodsmentioning

confidence: 99%

Debiasing Pre-Trained Language Models via Efficient Fine-Tuning

Michael¹,

Zhang²,

Lee³

2022

Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

View full text Add to dashboard Cite

An explosion in the popularity of transformerbased language models (such as GPT-3, BERT, RoBERTa, and ALBERT) has opened the doors to new machine learning applications involving language modeling, text generation, and more. However, recent scrutiny reveals that these language models contain inherent biases towards certain demographics reflected in their training data. While research has tried mitigating this problem, existing approaches either fail to remove the bias completely, degrade performance ("catastrophic forgetting"), or are costly to execute. This work examines how to reduce gender bias in a GPT-2 language model by fine-tuning less than 1% of its parameters. Through quantitative benchmarks, we show that this is a viable way to reduce prejudice in pre-trained language models while remaining cost-effective at scale.

show abstract

Section: Methodsmentioning

confidence: 99%

Debiasing Pre-Trained Language Models via Efficient Fine-Tuning

Michael¹,

Zhang²,

Lee³

2022

Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

View full text Add to dashboard Cite

show abstract

“…Zaken et al [46] introduced BitFit, a sparse update method that modifies only the bias terms of the model. Xiao et al [42] proposed offsite-tuning, which finetunes the top and bottom layers of model and compresses the large middle layers into a emulator using layer-drop. Lester et al [28] proposed prompt tuning, requiring to store and tune only a small task-specific prompt (i.e., a few tokens) for each downstream task.…”

Section: Efficient Finetuningmentioning

confidence: 99%

DC-CCL: Device-Cloud Collaborative Controlled Learning for Large Vision Models

Ding¹,

Niu²,

Wang³

et al. 2023

Preprint

View full text Add to dashboard Cite

Many large vision models have been deployed on the cloud for real-time services. Meanwhile, fresh samples are continuously generated on the served mobile device. How to leverage the device-side samples to improve the cloud-side large model becomes a practical requirement, but falls into the dilemma of no raw sample up-link and no large model down-link. Specifically, the user may opt out of sharing raw samples with the cloud due to the concern of privacy or communication overhead, while the size of some large vision models far exceeds the mobile device's runtime capacity. In this work, we propose a device-cloud collaborative controlled learning framework, called DC-CCL, enabling a cloud-side large vision model that cannot be directly deployed on the mobile device to still benefit from the device-side local samples. In particular, DC-CCL vertically splits the base model into two submodels, one large submodel for learning from the cloud-side samples and the other small submodel for learning from the device-side samples and performing device-cloud knowledge fusion. Nevertheless, on-device training of the small submodel requires the output of the cloud-side large submodel to compute the desired gradients. DC-CCL thus introduces a light-weight model to mimic the large cloud-side submodel with knowledge distillation, which can be offloaded to the mobile device to control its small submodel's optimization direction. Given the decoupling nature of two submodels in collaborative learning, DC-CCL also allows the cloud to take a pre-trained model and the mobile device to take another model with a different backbone architecture. We extensively evaluate DC-CCL over 5 public datasets and 6 common models, demonstrating its effectiveness and efficiency in approaching the performance of ideally leveraging the large model, as well as its remarkable advantage over the baseline of exploiting only the cloud-side samples or adopting only a device-affordable small model.

show abstract

“…Compared to full-parameter fine-tuning, these PEFT algorithms significantly reduce memory consumption, training time, and communication cost for fine-tuning LLMs. Besides, motivated by the practical concerns on intelligent property protection of LLMs, we also integrate a privacy-preserving fine-tuning algorithm, offsite-tuning (Xiao et al, 2023), for the scenario where clients only tune small adapters based on a distilled model from a full LLM.…”

Section: Overviewmentioning

confidence: 99%

“…To satisfy such practical demand, we adapt a privacy-preserving fine-tuning algorithm, offsitetuning (Xiao et al, 2023), to a federated version, and name it FedOT for short. It sends a lossy compressed model with untrainable parameters to the clients as an emulator of the complete LLM at the beginning of FL.…”

Section: Federated Fine-tuning Without Accessing Full Modelmentioning

confidence: 99%

“…In this section, we investigate the performance of federated fine-tuning LLMs without accessing the full model. As mentioned in Section 4.2, we adapt a privacy-preserving fine-tuning algorithm, offsite-tuning (Xiao et al, 2023), to federated scenarios. Specifically, we use the first and last two layers of LLaMA-7B as the adapter and compress the model as the emulator by dropping 20% and 50% of the remaining layers uniformly.…”

Section: Efficiency Of Peft Algorithms In Fs-llmmentioning

confidence: 99%

See 1 more Smart Citation

FederatedScope-GNN: Towards a Unified, Comprehensive and Efficient Package for Federated Graph Learning

Wang

Kuang

Xie

et al. 2022

Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Large language models (LLMs) have demonstrated great capabilities in various natural language understanding and generation tasks. Platforms such as Hugging Face facilitate access and utilization of the pre-trained LLMs for different entities, ranging from computer science researchers to users with little machine learning background. Different entities can further improve the performance of those LLMs on their specific downstream tasks by fine-tuning LLMs. When several entities have similar interested tasks, but their local data cannot be shared directly because of privacy concerns regulations, federated learning (FL) is a mainstream solution to leverage the data of different entities. Besides avoiding direct data sharing, FL can also achieve rigorous data privacy protection, model intelligent property protection, and model customization via composition with different techniques. However, fine-tuning LLMs in federated learning settings still lacks adequate support from the existing FL frameworks because it has to deal with optimizing the consumption of significant communication and computational resources, various data preparation for different tasks, and distinct information protection demands. This paper first discusses these challenges of federated fine-tuning LLMs in detail, and introduces our implemented package FederatedScope-LLM (FS-LLM) as a main contribution, which consists of the following components: (1) we build a complete end-to-end benchmarking pipeline, automizing the processes of dataset preprocessing, federated fine-tuning execution or simulation, and performance evaluation on federated LLM fine-tuning with different capability demonstration purposes; (2) we provide comprehensive and off-the-shelf federated parameterefficient fine-tuning (PEFT) algorithm implementations and versatile programming interfaces for future extension to enhance the capabilities of LLMs in FL scenarios with low communication and computation costs, even without accessing the full model (e.g., closed-source LLMs); (3) we adopt several accelerating operators and resource-efficient operators for fine-tuning LLMs with limited resources and the flexible pluggable sub-routines for interdisciplinary study (e.g., LLMs in personalized FL). We conduct extensive and reproducible experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-theart parameter-efficient fine-tuning algorithms in a federated setting, which also yields many valuable insights into federated fine-tuning LLMs for the research community. To facilitate further research and adoption, we release FS-LLM at https://github.com/alibaba/FederatedScope/tree/llm. 1

show abstract

Offsite-Tuning: Transfer Learning without Full Model

Cited by 6 publications

References 35 publications

Debiasing Pre-Trained Language Models via Efficient Fine-Tuning

Debiasing Pre-Trained Language Models via Efficient Fine-Tuning

DC-CCL: Device-Cloud Collaborative Controlled Learning for Large Vision Models

FederatedScope-GNN: Towards a Unified, Comprehensive and Efficient Package for Federated Graph Learning

Contact Info

Product

Resources

About