2017
DOI: 10.1007/978-3-319-70139-4_10
|View full text |Cite
|
Sign up to set email alerts
|

Learning with Partially Shared Features for Multi-Task Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(15 citation statements)
references
References 10 publications
0
15
0
Order By: Relevance
“…Compared methods. We compare BayesAgg-MTL with the following baseline methods: (1) Single Task Learning (STL), which learns each task independently under the same experimental setup as that of the MTL methods; (2) Linear Scalarization (LS), which assigns a uniform weight to all tasks, namely K k=1 ℓ k ; (3) Scale-Invariant (SI) (Navon et al, 2022), which assigns a uniform weight to the log of all tasks, namely K k=1 log ℓ k ; (4) Random Loss Weighting (RLW) (Lin et al, 2022), which allocates random weights to the losses at each iteration; (5) Dynamic Weight Average (DWA) (Liu et al, 2019a), which allocates a weight based on the rate of change of the loss for each task; (6) Uncertainty weighting (UW) (Kendall et al, 2018), which minimize a scalar term corresponding to the aleatoric uncertainty for each task; (7) Multiple-Gradient Descent Algorithm (MGDA) (Désidéri, 2012;Sener & Koltun, 2018), which finds a minimum norm solution for a convex combination of the losses; (8) Projecting Conflicting Gradients (PCGrad) (Yu et al, 2020), which projects the gradient of each task onto the normal plane of tasks they are in conflict with; (9) Conflict-Averse Grad (CAGrad) (Liu et al, 2021), which searches an update direction centered at the LS solution while minimizing conflicts in gradients; (10) Impartial MTL-Grad (IMTL-G) (Liu et al, 2020), which finds an update vector such that the projection of it on each of the gradients of the tasks is equal; (11) Nash-MTL (Navon (Dai et al, 2023), which suggests a Reinforcement learning procedure to balance the task losses; (13) Aligned-MTL-UB (Senushkin et al, 2023), which aligns the principle components of a gradient matrix.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Compared methods. We compare BayesAgg-MTL with the following baseline methods: (1) Single Task Learning (STL), which learns each task independently under the same experimental setup as that of the MTL methods; (2) Linear Scalarization (LS), which assigns a uniform weight to all tasks, namely K k=1 ℓ k ; (3) Scale-Invariant (SI) (Navon et al, 2022), which assigns a uniform weight to the log of all tasks, namely K k=1 log ℓ k ; (4) Random Loss Weighting (RLW) (Lin et al, 2022), which allocates random weights to the losses at each iteration; (5) Dynamic Weight Average (DWA) (Liu et al, 2019a), which allocates a weight based on the rate of change of the loss for each task; (6) Uncertainty weighting (UW) (Kendall et al, 2018), which minimize a scalar term corresponding to the aleatoric uncertainty for each task; (7) Multiple-Gradient Descent Algorithm (MGDA) (Désidéri, 2012;Sener & Koltun, 2018), which finds a minimum norm solution for a convex combination of the losses; (8) Projecting Conflicting Gradients (PCGrad) (Yu et al, 2020), which projects the gradient of each task onto the normal plane of tasks they are in conflict with; (9) Conflict-Averse Grad (CAGrad) (Liu et al, 2021), which searches an update direction centered at the LS solution while minimizing conflicts in gradients; (10) Impartial MTL-Grad (IMTL-G) (Liu et al, 2020), which finds an update vector such that the projection of it on each of the gradients of the tasks is equal; (11) Nash-MTL (Navon (Dai et al, 2023), which suggests a Reinforcement learning procedure to balance the task losses; (13) Aligned-MTL-UB (Senushkin et al, 2023), which aligns the principle components of a gradient matrix.…”
Section: Methodsmentioning
confidence: 99%
“…Nash-MTL (Navon et al, 2022) suggests treating MTL as a bargaining game to find Pareto optimal solutions. Several studies suggested adaptations for the multiple-gradient descent algorithm (MGDA) (Désidéri, 2012;Sener & Koltun, 2018), such as CAGrad, (Liu et al, 2021), and MoCo (Fernando et al, 2023). As opposed to previous methods, our approach considers both the mean and the variance of the gradients to derive an update direction.…”
Section: Related Workmentioning
confidence: 99%
“…It further contrasts with the joint architecture proposed in Chapter 2, where the loss function is composed of a weighted linear combination of multiple losses, each one corresponding to a particular task. This multitask approach presents additional challenges when defining the weights in order to normalize and avoid interference between each of the task's gradients [268][269][270][271][272][273].…”
Section: Towards Consistent Document-level Entity Linking: Joint Mode...mentioning
confidence: 99%
“…The first assumption is common in multi-objective approaches for multi-task learning (Liu et al, 2021;Navon et al, 2022). The last two assumptions are used in multiconstrained RL works (Kim et al, 2023) and are intended to ensure the existence of a solution that satisfies the constraints, which can be viewed as the Slater condition (Boyd & Vandenberghe, 2004).…”
Section: Convergence Analysismentioning
confidence: 99%