Action Branching Architectures for Deep Reinforcement Learning

Tavakoli, Arash; Pardo, Fabio; Kormushev, Petar

doi:10.48550/arxiv.1711.08946

Cited by 7 publications

(11 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In a branching neural network for the policy function estimation, action space can be separated into several dimensions and each neural network branch estimates a policy function on an action dimension. In [33], A. Tavakoli et al have proven the effectiveness of the branching neural network model.…”

Section: ( )mentioning

confidence: 99%

Proactive Content Caching Based on Actor–Critic Reinforcement Learning for Mobile Edge Networks

Jiang

Feng

Sun

et al. 2022

IEEE Trans. Cogn. Commun. Netw.

View full text Add to dashboard Cite

Section: ( )mentioning

confidence: 99%

Proactive Content Caching Based on Actor–Critic Reinforcement Learning for Mobile Edge Networks

Jiang

Feng

Sun

et al. 2022

IEEE Trans. Cogn. Commun. Netw.

View full text Add to dashboard Cite

“…The resource controller uses a deep Q-network with an action-branching architecture [14] as depicted in Figure 3(b). This architecture features a shared representation layer, followed by distinct action branches, one for each resource control knob.…”

Section: Model Architecturementioning

confidence: 99%

PROMPT: Learning Dynamic Resource Allocation Policies for Edge-Network Applications

Penney¹,

Li²,

Sydir³

et al. 2022

Preprint

View full text Add to dashboard Cite

A growing number of service providers are exploring methods to improve server utilization, reduce power consumption, and reduce total cost of ownership by co-scheduling high-priority latency-critical workloads with best-effort workloads. This practice requires strict resource allocation between workloads to reduce resource contention and maintain Quality of Service (QoS) guarantees. Prior resource allocation works have been shown to improve server utilization under ideal circumstances, yet often compromise QoS guarantees or fail to find valid resource allocations in more dynamic operating environments. Further, prior works are fundamentally reliant upon QoS measurements that can, in practice, exhibit significant transient fluctuations, thus stable control behavior cannot be reliably achieved. In this paper, we propose a novel framework for dynamic resource allocation based on proactive QoS prediction. These predictions help guide a reinforcement-learningbased resource controller towards optimal resource allocations while avoiding transient QoS violations due to fluctuating workload demands. Evaluation shows that the proposed method incurs 4.3x fewer QoS violations, reduces severity of QoS violations by 3.7x, improves best-effort workload performance, and improves overall power efficiency compared with prior work.

show abstract

“…However, it is still not straightforward how to properly define the advantage of an executed action since the size of the action space A[i] is time-varying for each decision time i. To handle this technical issue, we adapt a branching dueling Q-network architecture [36] and measure the advantage of an executed action separately by each time slot. In particular, we consider action branches [1 : K max ], i.e., time slots [1 : K max ].…”

Section: B Dqn Architecturementioning

confidence: 99%

Dynamic Multichannel Access via Multi-agent Reinforcement Learning: Throughput and Fairness Guarantees

Sohaib¹,

Jeong²,

Jeon³

2021

Preprint

View full text Add to dashboard Cite

We consider a multichannel random access system in which each user accesses a single channel at each time slot to communicate with an access point (AP). Users arrive to the system at random and be activated for a certain period of time slots and then disappear from the system. Under such dynamic network environment, we propose a distributed multichannel access protocol based on multi-agent reinforcement learning (RL) to improve both throughput and fairness between active users. Unlike the previous approaches adjusting channel access probabilities at each time slot, the proposed RL algorithm deterministically selects a set of channel access policies for several consecutive time slots. To effectively reduce the complexity of the proposed RL algorithm, we adopt a branching dueling Q-network architecture and propose an efficient training methodology for producing proper Q-values over time-varying user sets. We perform extensive simulations on realistic traffic environments and demonstrate that the proposed online learning improves both throughput and fairness compared to the conventional RL approaches and centralized scheduling policies.

show abstract

Action Branching Architectures for Deep Reinforcement Learning

Cited by 7 publications

References 10 publications

Proactive Content Caching Based on Actor–Critic Reinforcement Learning for Mobile Edge Networks

Proactive Content Caching Based on Actor–Critic Reinforcement Learning for Mobile Edge Networks

PROMPT: Learning Dynamic Resource Allocation Policies for Edge-Network Applications

Dynamic Multichannel Access via Multi-agent Reinforcement Learning: Throughput and Fairness Guarantees

Contact Info

Product

Resources

About