Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Mallya, Arun; Davis, Douglas A.; Lazebnik, Svetlana

doi:10.1007/978-3-030-01225-0_5

Cited by 456 publications

(390 citation statements)

References 26 publications

Supporting

Mentioning

388

Contrasting

Unclassified

Order By: Relevance

“…The first protocol described in [38] consists of five datasets: CUB [67], Stanford Cars [25], Oxford Flowers [46], WikiArt [56] and Sketch [10]. Following [38], we use the same train/test set splits, and apply the NETTAILOR procedure to the same backbone network, ResNet50, with an input size 224x224. The second protocol is the visual decathlon benchmark [49] and consists of ten different datasets including ImageNet, Omniglot, German Traffic Signs, among others.…”

Section: Comparison To Prior Workmentioning

confidence: 99%

“…While feature extraction shares most weights across datasets, differences between the source and target domains cannot be corrected, thus achieving low performance. More refined methods, such as PackNet [39] and Piggyback [38], try to selectively adjust the network weights in order to remember previous tasks, or freeze the backbone network and learn a small set of task-specific parameters (a set of masking weights in the case of Piggyback) that is used to bridge the gap between source and target tasks. All these methods ignore the fact that source and target datasets can differ in terms of difficulty, and thus the architecture itself should be adjusted to the target task, not just the weights.…”

Section: Comparison To Prior Workmentioning

confidence: 99%

See 1 more Smart Citation

NetTailor: Tuning the Architecture, Not Just the Weights

Morgado

Vasconcelos

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Real-world applications of object recognition often require the solution of multiple tasks in a single platform. Under the standard paradigm of network fine-tuning, an entirely new CNN is learned per task, and the final network size is independent of task complexity. This is wasteful, since simple tasks require smaller networks than more complex tasks, and limits the number of tasks that can be solved simultaneously. To address these problems, we propose a transfer learning procedure, denoted NETTAILOR 1 , in which layers of a pre-trained CNN are used as universal blocks that can be combined with small task-specific layers to generate new networks. Besides minimizing classification error, the new network is trained to mimic the internal activations of a strong unconstrained CNN, and minimize its complexity by the combination of 1) a soft-attention mechanism over blocks and 2) complexity regularization constraints. In this way, NETTAILOR can adapt the network architecture, not just its weights, to the target task. Experiments show that networks adapted to simple tasks, such as character or traffic sign recognition, become significantly smaller than those adapted to hard tasks, such as fine-grained recognition. More importantly, due to the modular nature of the procedure, this reduction in network complexity is achieved without compromise of either parameter sharing across tasks, or classification accuracy.

show abstract

Section: Comparison To Prior Workmentioning

confidence: 99%

Section: Comparison To Prior Workmentioning

confidence: 99%

NetTailor: Tuning the Architecture, Not Just the Weights

Morgado

Vasconcelos

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…Figure 2 The weight distribution after this step is shown in Figure 3d. It is worth mentioning that previous works, such as PackNet [22] and Piggyback [21], prune the secondary parameters and thus, distort the weight distribution (Figure 3b). At epoch 300, task T 2 appears and updates the parameters.…”

Section: In-depth Analysismentioning

confidence: 99%

“…In this case, the rest of the parameters no longer contain prior knowledge, violating the aforementioned properties of an ideal continual learning system. For instance, PackNet [21] and Piggyback [22] achieve strong performance on multi-head evaluation but not on single-head.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Single-Net Continual Learning with Progressive Segmented Training

Charan

Liu

et al. 2019

2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA)

View full text Add to dashboard Cite

There is an increasing need of continual learning in dynamic systems, such as the self-driving vehicle, the surveillance drone, and the robotic system. Such a system requires learning from the data stream, training the model to preserve previous information and adapt to a new task, and generating a single-headed vector for future inference. Different from previous approaches with dynamic structures, this work focuses on a single network and model segmentation to prevent catastrophic forgetting. Leveraging the redundant capacity of a single network, model parameters for each task are separated into two groups: one important group which is frozen to preserve current knowledge, and secondary group to be saved (not pruned) for a future learning. A fixed-size memory containing a small amount of previously seen data is further adopted to assist the training. Without additional regularization, the simple yet effective approach of PST successfully incorporates multiple tasks and achieves the state-of-the-art accuracy in the singlehead evaluation on CIFAR-10 and CIFAR-100 datasets. Moreover, the segmented training significantly improves computation efficiency in continual learning.Preprint. Under review.

show abstract

Adding New Tasks to a Single Network with Weight Transformations Using Binary Masks

Mancini

Ricci

Caputo

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Visual recognition algorithms are required today to exhibit adaptive abilities. Given a deep model trained on a specific, given task, it would be highly desirable to be able to adapt incrementally to new tasks, preserving scalability as the number of new tasks increases, while at the same time avoiding catastrophic forgetting issues. Recent work has shown that masking the internal weights of a given original conv-net through learned binary variables is a promising strategy. We build upon this intuition and take into account more elaborated affine transformations of the convolutional weights that include learned binary masks. We show that with our generalization it is possible to achieve significantly higher levels of adaptation to new tasks, enabling the approach to compete with fine tuning strategies by requiring slightly more than 1 bit per network parameter per additional task. Experiments on two popular benchmarks showcase the power of our approach, that achieves the new state of the art on the Visual Decathlon Challenge.

show abstract

Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Cited by 456 publications

References 26 publications

NetTailor: Tuning the Architecture, Not Just the Weights

NetTailor: Tuning the Architecture, Not Just the Weights

Single-Net Continual Learning with Progressive Segmented Training

Adding New Tasks to a Single Network with Weight Transformations Using Binary Masks

Contact Info

Product

Resources

About