Learning with Partially Shared Features for Multi-Task Learning

Liu, Cheng; Cao, Wenming; Zheng, Chu-Tao; Wong, Hau-San

doi:10.1007/978-3-319-70139-4_10

Cited by 10 publications

(15 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Compared methods. We compare BayesAgg-MTL with the following baseline methods: (1) Single Task Learning (STL), which learns each task independently under the same experimental setup as that of the MTL methods; (2) Linear Scalarization (LS), which assigns a uniform weight to all tasks, namely K k=1 ℓ k ; (3) Scale-Invariant (SI) (Navon et al, 2022), which assigns a uniform weight to the log of all tasks, namely K k=1 log ℓ k ; (4) Random Loss Weighting (RLW) (Lin et al, 2022), which allocates random weights to the losses at each iteration; (5) Dynamic Weight Average (DWA) (Liu et al, 2019a), which allocates a weight based on the rate of change of the loss for each task; (6) Uncertainty weighting (UW) (Kendall et al, 2018), which minimize a scalar term corresponding to the aleatoric uncertainty for each task; (7) Multiple-Gradient Descent Algorithm (MGDA) (Désidéri, 2012;Sener & Koltun, 2018), which finds a minimum norm solution for a convex combination of the losses; (8) Projecting Conflicting Gradients (PCGrad) (Yu et al, 2020), which projects the gradient of each task onto the normal plane of tasks they are in conflict with; (9) Conflict-Averse Grad (CAGrad) (Liu et al, 2021), which searches an update direction centered at the LS solution while minimizing conflicts in gradients; (10) Impartial MTL-Grad (IMTL-G) (Liu et al, 2020), which finds an update vector such that the projection of it on each of the gradients of the tasks is equal; (11) Nash-MTL (Navon (Dai et al, 2023), which suggests a Reinforcement learning procedure to balance the task losses; (13) Aligned-MTL-UB (Senushkin et al, 2023), which aligns the principle components of a gradient matrix.…”

Section: Methodsmentioning

confidence: 99%

“…Nash-MTL (Navon et al, 2022) suggests treating MTL as a bargaining game to find Pareto optimal solutions. Several studies suggested adaptations for the multiple-gradient descent algorithm (MGDA) (Désidéri, 2012;Sener & Koltun, 2018), such as CAGrad, (Liu et al, 2021), and MoCo (Fernando et al, 2023). As opposed to previous methods, our approach considers both the mean and the variance of the gradients to derive an update direction.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Communication Efficient Distributed Learning Over Wireless Channels

Achituve,

Wang,

Fetaya

et al. 2023

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

As machine learning becomes more prominent there is a growing demand to perform several inference tasks in parallel. Running a dedicated model for each task is computationally expensive and therefore there is a great interest in multi-task learning (MTL). MTL aims at learning a single model that solves several tasks efficiently. Optimizing MTL models is often achieved by computing a single gradient per task and aggregating them for obtaining a combined update direction. However, these approaches do not consider an important aspect, the sensitivity in the gradient dimensions. Here, we introduce a novel gradient aggregation approach using Bayesian inference. We place a probability distribution over the task-specific parameters, which in turn induce a distribution over the gradients of the tasks. This additional valuable information allows us to quantify the uncertainty in each of the gradients dimensions, which can then be factored in when aggregating them. We empirically demonstrate the benefits of our approach in a variety of datasets, achieving state-of-the-art performance.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Communication Efficient Distributed Learning Over Wireless Channels

Achituve,

Wang,

Fetaya

et al. 2023

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

show abstract

“…It further contrasts with the joint architecture proposed in Chapter 2, where the loss function is composed of a weighted linear combination of multiple losses, each one corresponding to a particular task. This multitask approach presents additional challenges when defining the weights in order to normalize and avoid interference between each of the task's gradients [268][269][270][271][272][273].…”

Section: Towards Consistent Document-level Entity Linking: Joint Mode...mentioning

confidence: 99%

Towards Consistent Document-level Entity Linking: Joint Models for Entity Linking and Coreference Resolution

Zaporojets

Deleu²,

Demeester³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

View full text Add to dashboard Cite

We consider the task of document-level entity linking (EL), where it is important to make consistent decisions for entity mentions over the full document jointly. We aim to leverage explicit "connections" among mentions within the document itself: we propose to join EL and coreference resolution (coref) in a single structured prediction task over directed trees and use a globally normalized model to solve it. This contrasts with related works where two separate models are trained for each of the tasks and additional logic is required to merge the outputs. Experimental results on two datasets show a boost of up to +5% F1score on both coref and EL tasks, compared to their standalone counterparts. For a subset of hard cases, with individual mentions lacking the correct EL in their candidate entity list, we obtain a +50% increase in accuracy. 1

show abstract

“…The first assumption is common in multi-objective approaches for multi-task learning (Liu et al, 2021;Navon et al, 2022). The last two assumptions are used in multiconstrained RL works (Kim et al, 2023) and are intended to ensure the existence of a solution that satisfies the constraints, which can be viewed as the Slater condition (Boyd & Vandenberghe, 2004).…”

Section: Convergence Analysismentioning

confidence: 99%

Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning

2022

View full text Add to dashboard Cite

Control intelligence is a typical field where there is a trade-off between target objectives, and researchers in this field have longed for artificial intelligence that achieves the target objectives. Multi-objective deep reinforcement learning was sufficient to satisfy this need. In particular, multi-objective deep reinforcement learning methods based on policy optimization are leading the optimization of control intelligence. However, multi-objective reinforcement learning has difficulties when finding various Pareto optimals of multi-objectives due to the greedy nature of reinforcement learning. We propose a method of policy assimilation to solve this problem. This method was applied to MO-V-MPO, one of preference-based multi-objective reinforcement learning, to increase diversity. The performance of this method has been verified through experiments in a continuous control environment.

show abstract

Learning with Partially Shared Features for Multi-Task Learning

Cited by 10 publications

References 10 publications

Communication Efficient Distributed Learning Over Wireless Channels

Communication Efficient Distributed Learning Over Wireless Channels

Towards Consistent Document-level Entity Linking: Joint Models for Entity Linking and Coreference Resolution

Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning

Contact Info

Product

Resources

About