2018
DOI: 10.1007/978-3-030-01225-0_5
|View full text |Cite
|
Sign up to set email alerts
|

Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Abstract: This work presents a method for adapting a single, fixed deep neural network to multiple tasks without affecting performance on already learned tasks. By building upon ideas from network quantization and pruning, we learn binary masks that "piggyback" on an existing network, or are applied to unmodified weights of that network to provide good performance on a new task. These masks are learned in an end-toend differentiable fashion, and incur a low overhead of 1 bit per network parameter, per task. Even though … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
388
0
2

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 456 publications
(390 citation statements)
references
References 26 publications
0
388
0
2
Order By: Relevance
“…The first protocol described in [38] consists of five datasets: CUB [67], Stanford Cars [25], Oxford Flowers [46], WikiArt [56] and Sketch [10]. Following [38], we use the same train/test set splits, and apply the NETTAILOR procedure to the same backbone network, ResNet50, with an input size 224x224. The second protocol is the visual decathlon benchmark [49] and consists of ten different datasets including ImageNet, Omniglot, German Traffic Signs, among others.…”
Section: Comparison To Prior Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The first protocol described in [38] consists of five datasets: CUB [67], Stanford Cars [25], Oxford Flowers [46], WikiArt [56] and Sketch [10]. Following [38], we use the same train/test set splits, and apply the NETTAILOR procedure to the same backbone network, ResNet50, with an input size 224x224. The second protocol is the visual decathlon benchmark [49] and consists of ten different datasets including ImageNet, Omniglot, German Traffic Signs, among others.…”
Section: Comparison To Prior Workmentioning
confidence: 99%
“…While feature extraction shares most weights across datasets, differences between the source and target domains cannot be corrected, thus achieving low performance. More refined methods, such as PackNet [39] and Piggyback [38], try to selectively adjust the network weights in order to remember previous tasks, or freeze the backbone network and learn a small set of task-specific parameters (a set of masking weights in the case of Piggyback) that is used to bridge the gap between source and target tasks. All these methods ignore the fact that source and target datasets can differ in terms of difficulty, and thus the architecture itself should be adjusted to the target task, not just the weights.…”
Section: Comparison To Prior Workmentioning
confidence: 99%
“…Figure 2 The weight distribution after this step is shown in Figure 3d. It is worth mentioning that previous works, such as PackNet [22] and Piggyback [21], prune the secondary parameters and thus, distort the weight distribution (Figure 3b). At epoch 300, task T 2 appears and updates the parameters.…”
Section: In-depth Analysismentioning
confidence: 99%
“…In this case, the rest of the parameters no longer contain prior knowledge, violating the aforementioned properties of an ideal continual learning system. For instance, PackNet [21] and Piggyback [22] achieve strong performance on multi-head evaluation but not on single-head.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation