Deep Model Assembling

Ni, Zanlin; Wang, Yulin; Yu, Jiangwei; Jiang, Haojun; Cao, Yong; Huang, Gao

doi:10.48550/arxiv.2212.04129

Cited by 2 publications

(2 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Decoupled Learning breaks down the end-to-end optimization problem of neural network training into smaller subproblems. This is achieved through various techniques such as the use of auxiliary variables (Askari et al, 2018;Li et al, 2019;Taylor et al, 2016;Zhang & Brand, 2017), delayed gradient descent (Huo et al, 2018;Xu et al, 2020), and model assembly (Ni et al, 2022). However, current decoupled learning methods have primarily been developed for training neural networks from scratch and have not yet been extensively explored for fine-tuning already trained large foundation models.…”

Section: Related Workmentioning

confidence: 99%

Offsite-Tuning: Transfer Learning without Full Model

Xiao¹,

Ji²,

Han³

2023

Preprint

View full text Add to dashboard Cite

Transfer learning is important for foundation models to adapt to downstream tasks. However, many foundation models are proprietary, so users must share their data with model owners to fine-tune the models, which is costly and raise privacy concerns. Moreover, fine-tuning large foundation models is computation-intensive and impractical for most downstream users. In this paper, we propose Offsite-Tuning, a privacy-preserving and efficient transfer learning framework that can adapt billion-parameter foundation models to downstream data without access to the full model. In offsite-tuning, the model owner sends a lightweight adapter and a lossy compressed emulator to the data owner, who then fine-tunes the adapter on the downstream data with the emulator's assistance. The fine-tuned adapter is then returned to the model owner, who plugs it into the full model to create an adapted foundation model. Offsite-tuning preserves both parties' privacy and is computationally more efficient than the existing fine-tuning methods that require access to the full model weights. We demonstrate the effectiveness of offsite-tuning on various large language and vision foundation models. Offsitetuning can achieve comparable accuracy as full model fine-tuning while being privacy-preserving and efficient, achieving 6.5× speedup and 5.6× memory reduction. Code is available here.* https://lambdalabs.com/blog/demystifying-gpt-3

show abstract

Section: Related Workmentioning

confidence: 99%

Offsite-Tuning: Transfer Learning without Full Model

Xiao¹,

Ji²,

Han³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Some existing work on the cloud-based distributed learning studied model parallelism methods. Ni et al [33] proposed to divide the model horizontally into submodels, train them in parallel, and link them for final finetuning. Krizhevsky et al [26] proposed group convolution to facilitate model parallelism, which splits the channels of a convolutional layer into multiple groups and performed convolution within each group, thus decoupling their forward processes.…”

Section: Decoupled Structurementioning

confidence: 99%

DC-CCL: Device-Cloud Collaborative Controlled Learning for Large Vision Models

Ding¹,

Niu²,

Wang³

et al. 2023

Preprint

View full text Add to dashboard Cite

Many large vision models have been deployed on the cloud for real-time services. Meanwhile, fresh samples are continuously generated on the served mobile device. How to leverage the device-side samples to improve the cloud-side large model becomes a practical requirement, but falls into the dilemma of no raw sample up-link and no large model down-link. Specifically, the user may opt out of sharing raw samples with the cloud due to the concern of privacy or communication overhead, while the size of some large vision models far exceeds the mobile device's runtime capacity. In this work, we propose a device-cloud collaborative controlled learning framework, called DC-CCL, enabling a cloud-side large vision model that cannot be directly deployed on the mobile device to still benefit from the device-side local samples. In particular, DC-CCL vertically splits the base model into two submodels, one large submodel for learning from the cloud-side samples and the other small submodel for learning from the device-side samples and performing device-cloud knowledge fusion. Nevertheless, on-device training of the small submodel requires the output of the cloud-side large submodel to compute the desired gradients. DC-CCL thus introduces a light-weight model to mimic the large cloud-side submodel with knowledge distillation, which can be offloaded to the mobile device to control its small submodel's optimization direction. Given the decoupling nature of two submodels in collaborative learning, DC-CCL also allows the cloud to take a pre-trained model and the mobile device to take another model with a different backbone architecture. We extensively evaluate DC-CCL over 5 public datasets and 6 common models, demonstrating its effectiveness and efficiency in approaching the performance of ideally leveraging the large model, as well as its remarkable advantage over the baseline of exploiting only the cloud-side samples or adopting only a device-affordable small model.

show abstract

Deep Model Assembling

Cited by 2 publications

References 0 publications

Offsite-Tuning: Transfer Learning without Full Model

Offsite-Tuning: Transfer Learning without Full Model

DC-CCL: Device-Cloud Collaborative Controlled Learning for Large Vision Models

Contact Info

Product

Resources

About