IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) 2019
DOI: 10.1109/infcomw.2019.8845154
|View full text |Cite
|
Sign up to set email alerts
|

Delay-Optimal Traffic Engineering through Multi-agent Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 27 publications
(6 citation statements)
references
References 18 publications
0
6
0
Order By: Relevance
“…To solve the above MA-MDP problem, we exploit the multi-agent reinforcement learning, where the routers (agents) distributively learn the optimal target forwarding policy ๐œ‹ ๐œ‹ ๐œ‹ to minimize the average server-worker delay. To implement the multi-agent reinforcement learning algorithm, we adopt a distributed actor-critic architecture similar to asynchronous advantage actor-critic (A3C) [26], [27], where each router individually runs a local critic and a local actor,…”
Section: B Convergence Optimization Via Multi-agent Reinforcement Lea...mentioning
confidence: 99%
See 1 more Smart Citation
“…To solve the above MA-MDP problem, we exploit the multi-agent reinforcement learning, where the routers (agents) distributively learn the optimal target forwarding policy ๐œ‹ ๐œ‹ ๐œ‹ to minimize the average server-worker delay. To implement the multi-agent reinforcement learning algorithm, we adopt a distributed actor-critic architecture similar to asynchronous advantage actor-critic (A3C) [26], [27], where each router individually runs a local critic and a local actor,…”
Section: B Convergence Optimization Via Multi-agent Reinforcement Lea...mentioning
confidence: 99%
“…Model and Dataset: We use FEMNIST, the federated version of MNIST [41] on the LEAF [30] character recognition task, where LEAF is a benchmarking framework for federated learning. FEMNIST consists of handwritten digits (10), uppercase (26), and lowercase (26) letters leading to a total of 62 classes with each image having 28ร—28 pixels. The whole dataset is partitioned into 3550 data portions/users with Non-IID data distribution.…”
Section: A Experiments Setupmentioning
confidence: 99%
“…The deep Q-routing algorithm proposed in [148] uses the regular TD prediction to calculate the expected long term return, which may lead to high bias since the TD prediction only considers the impact of the next-hop on the expected return. To overcome this issue, the authors in [150] propose a spatial difference (SD) prediction approach for the packet routing problem. The SD prediction approach leverages the number of hops from the current router node as the standard to outline different Q-value estimation methods, e.g., 1-hop or nhop action-value estimation.…”
Section: Packet Routingmentioning
confidence: 99%
“…Precisely, MDP represents and abstract representation of learning problems through interaction to achieve a target control and optimization goal. This work addresses the objective by representing the TE problem as a multi-agent MDP (MA-MDP) [157][158][159].…”
Section: Rnn-lstm Inference Model Creationmentioning
confidence: 99%