2017
DOI: 10.48550/arxiv.1711.08946
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Action Branching Architectures for Deep Reinforcement Learning

Abstract: Discrete-action algorithms have been central to numerous recent successes of deep reinforcement learning. However, applying these algorithms to high-dimensional action tasks requires tackling the combinatorial increase of the number of possible actions with the number of action dimensions. This problem is further exacerbated for continuous-action tasks that require fine control of actions via discretization. In this paper, we propose a novel neural architecture featuring a shared decision module followed by se… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(11 citation statements)
references
References 10 publications
0
11
0
Order By: Relevance
“…In a branching neural network for the policy function estimation, action space can be separated into several dimensions and each neural network branch estimates a policy function on an action dimension. In [33], A. Tavakoli et al have proven the effectiveness of the branching neural network model.…”
Section: ( )mentioning
confidence: 99%
“…In a branching neural network for the policy function estimation, action space can be separated into several dimensions and each neural network branch estimates a policy function on an action dimension. In [33], A. Tavakoli et al have proven the effectiveness of the branching neural network model.…”
Section: ( )mentioning
confidence: 99%
“…The resource controller uses a deep Q-network with an action-branching architecture [14] as depicted in Figure 3(b). This architecture features a shared representation layer, followed by distinct action branches, one for each resource control knob.…”
Section: Model Architecturementioning
confidence: 99%
“…However, it is still not straightforward how to properly define the advantage of an executed action since the size of the action space A[i] is time-varying for each decision time i. To handle this technical issue, we adapt a branching dueling Q-network architecture [36] and measure the advantage of an executed action separately by each time slot. In particular, we consider action branches [1 : K max ], i.e., time slots [1 : K max ].…”
Section: B Dqn Architecturementioning
confidence: 99%