2022
DOI: 10.1007/s10489-022-04028-8
|View full text |Cite
|
Sign up to set email alerts
|

Decentralized multi-task reinforcement learning policy gradient method with momentum over networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…Multi-task RL in general studies efficiently solving the policy optimization tasks for multiple RL environments at the same time by leveraging connections between the tasks. Its most common mathematical formulation is to find a single policy that maximizes the (weighted) average of the cumulative returns collected across all environments, and Zeng et al (2021); Jiang et al (2022); Junru et al (2022); Chen et al (2022a) study various gradient-based algorithms that provably converge to global or local solutions of this objective. However, as pointed out in Hessel et al (2019), this average return formulation can be inadequate when modelling practical problems where the tasks have strong conflicting or imbalanced interests.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Multi-task RL in general studies efficiently solving the policy optimization tasks for multiple RL environments at the same time by leveraging connections between the tasks. Its most common mathematical formulation is to find a single policy that maximizes the (weighted) average of the cumulative returns collected across all environments, and Zeng et al (2021); Jiang et al (2022); Junru et al (2022); Chen et al (2022a) study various gradient-based algorithms that provably converge to global or local solutions of this objective. However, as pointed out in Hessel et al (2019), this average return formulation can be inadequate when modelling practical problems where the tasks have strong conflicting or imbalanced interests.…”
Section: Related Workmentioning
confidence: 99%
“…We note that Eq. ( 3) obviously subsumes the non-constraint multi-task formulation in Zeng et al (2021); Jiang et al (2022); Junru et al (2022) by properly choosing tℓ i , u i u. It can be shown that the multi-task policy optimization problem (even without constraints) does not observe the gradient domination condition in general, which makes it difficult for any gradient-based algorithm to find the globally optimal policy.…”
Section: Given a Policy π P ∆ Simentioning
confidence: 99%