2019
DOI: 10.1007/s10458-019-09407-z
|View full text |Cite
|
Sign up to set email alerts
|

Decomposition methods with deep corrections for reinforcement learning

Abstract: Decomposition methods have been proposed to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used to separate the global objective into local tasks considering each individual entity independently. An arbitrator is then responsible for combining the individual utilities and selecting an action in real time to solve the global problem. Although these techniques can perform well empirically, they rely on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 36 publications
(74 reference statements)
0
6
0
Order By: Relevance
“…We assume that Γ is always drivable. Note that we denote the control input as u and the actions of the RL agent as a RL , which encodes trajectories of control inputs in our setting, as detailed later in (22). We use the projection operator proj (x) to map a state x ∈ X to its elements specified by .…”
Section: Preliminaries a Notationmentioning
confidence: 99%
“…We assume that Γ is always drivable. Note that we denote the control input as u and the actions of the RL agent as a RL , which encodes trajectories of control inputs in our setting, as detailed later in (22). We use the projection operator proj (x) to map a state x ∈ X to its elements specified by .…”
Section: Preliminaries a Notationmentioning
confidence: 99%
“…These individual functions are then combined in real time to solve the overall problem while sacrificing optimality. Deep-neural-network-based correction methods are then applied to learn correction terms to improve the performance [55].…”
Section: Utility Decomposition With Deep Correction Learningmentioning
confidence: 99%
“…Each subtask is then solved separately in isolation. As solving the global problem suffers from the curse of dimensionality, solving each isolated subtask requires exceptionally less computation power [55]. At the end of the process each solution is combined through an approximate function f , also named the utility fusion function, such as:…”
Section: Utility Decompositionmentioning
confidence: 99%
See 2 more Smart Citations