“…Sharing parameters across tasks [Parisotto et al, 2015, Rusu et al, 2015, Teh et al, 2017 usually results in conflicting gradients from different tasks. One way to mitigate this is to explicitly model the similarity between gradients obtained from different tasks [Yu et al, 2020, Zhang and Yeung, 2014, Kendall et al, 2018, Lin et al, 2019, Sener and Koltun, 2018, Du et al, 2018. On the other hand, researchers propose to utilize different modules for different tasks, thus reducing the interference of gradients from different tasks [Singh, 1992, Andreas et al, 2017, Rusu et al, 2016, Qureshi et al, 2019, Peng et al, 2019, Haarnoja et al, 2018, Sahni et al, 2017.…”