2020
DOI: 10.48550/arxiv.2001.06782
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Gradient Surgery for Multi-Task Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
137
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 84 publications
(138 citation statements)
references
References 0 publications
1
137
0
Order By: Relevance
“…Reweighting has also become popular in multitask learning (Chen et al, 2018;Kendall et al, 2018), where different tasks must be balanced with each other for optimal training. Multitask learning also has popularized gradient comparison techniques (Yu et al, 2020;Chen et al, 2020), which we leverage heavily within this current work.…”
Section: Related Workmentioning
confidence: 99%
“…Reweighting has also become popular in multitask learning (Chen et al, 2018;Kendall et al, 2018), where different tasks must be balanced with each other for optimal training. Multitask learning also has popularized gradient comparison techniques (Yu et al, 2020;Chen et al, 2020), which we leverage heavily within this current work.…”
Section: Related Workmentioning
confidence: 99%
“…Sharing parameters across tasks [Parisotto et al, 2015, Rusu et al, 2015, Teh et al, 2017 usually results in conflicting gradients from different tasks. One way to mitigate this is to explicitly model the similarity between gradients obtained from different tasks [Yu et al, 2020, Zhang and Yeung, 2014, Kendall et al, 2018, Lin et al, 2019, Sener and Koltun, 2018, Du et al, 2018. On the other hand, researchers propose to utilize different modules for different tasks, thus reducing the interference of gradients from different tasks [Singh, 1992, Andreas et al, 2017, Rusu et al, 2016, Qureshi et al, 2019, Peng et al, 2019, Haarnoja et al, 2018, Sahni et al, 2017.…”
Section: Related Workmentioning
confidence: 99%
“…Multi-task learning is notoriously difficult [Caruana, 1997, Ruder, 2017 and Yu et al [2020] hypothesize that the optimization difficulties might be due to the gradients from different tasks confliciting with each other thus hurting the learning process. In this work, we propose a multi-task bilevel learning framework for more effective multi-objective curricula DRL learning.…”
Section: Introductionmentioning
confidence: 99%
“…We do so by alternating gradient updates on batches sampled from each dataset in turn. Further details are in Appendix E. • CoTrain + PCGrad: An extension of CoTrain, where we leverage the method PCGrad [72] to perform gradient projection and prevent destructive gradient interference between updates from D PT and D FT . Further details and variants we tried are in Appendix E.…”
Section: Problem Setupmentioning
confidence: 99%
“…CoTrain + PCGrad details: In our implementation, we computed gradient updates using a batch of data from D PT and D FT separately, averaging the losses across the set of binary tasks in each dataset (5000 for D PT and 40 for D FT ). PCGrad [72] was then used to compute the final gradient update given these two averaged losses. We also experimented with: (1) computing the overall update using all 5040 tasks (rather than averaging), but this was too memory expensive; and (2) computing the overall update using an average over the 5000 PT tasks and each of the 40 FT tasks individually, but this was unstable and did not converge.…”
Section: E2 Further Experimental Details E21 Baselinesmentioning
confidence: 99%