2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) 2017
DOI: 10.1109/icmla.2017.0-184
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Coordinate with Deep Reinforcement Learning in Doubles Pong Game

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(12 citation statements)
references
References 13 publications
0
12
0
Order By: Relevance
“…Through this independence degree, the agent learns to decide whether it needs to act independently or cooperate with other agents in different circumstances. Likewise, Diallo et al [18] extended DQN to a multi-agent concurrent DQN and demonstrated that this method can converge in a non-stationary environment. Foerster et al [25] alternatively introduced two methods for stabilising experience replay of DQN in MADRL.…”
Section: Non-stationaritymentioning
confidence: 99%
See 1 more Smart Citation
“…Through this independence degree, the agent learns to decide whether it needs to act independently or cooperate with other agents in different circumstances. Likewise, Diallo et al [18] extended DQN to a multi-agent concurrent DQN and demonstrated that this method can converge in a non-stationary environment. Foerster et al [25] alternatively introduced two methods for stabilising experience replay of DQN in MADRL.…”
Section: Non-stationaritymentioning
confidence: 99%
“…Value-based Actor-critic Policy-based Partial observability DRQN [36]; DDRQN [24]; RIAL and DIAL [23]; Action-specific DRQN [121]; MT-MARL [85]; PS-DQN [30]; RL as a Rehearsal (RLaR) [55] PS-DDPG and PS-A3C [30]; MADDPG-M [48] DPIQN and DRPIQN [42]; PS-TRPO [30]; Bayesian action decoder (BAD) [26] Nonstationarity DRUQN and DLCQN [12]; Multi-agent concurrent DQN [18]; Recurrent DQN-based multi-agent importance sampling and fingerprints [25]; Hysteretic-DQN [85]; Lenient-DQN [86]; WDDQN [120] MADDPG [68]; PS-A3C [30] PS-TRPO [30]…”
Section: Challengesmentioning
confidence: 99%
“…In [85], Elhadji et al embedded MADRL in distributed agents, which are computer players in a pong game (A.6). The agents have global goals (i.e., winning as a team) and independent decision-making capabilities influenced by one another's decisions.…”
Section: Elhadji's Madrl With Concurrent Learningmentioning
confidence: 99%
“…DRUQN tries to avoid policy bias by updating the value of the action inversely proportional to the probability of selecting that action. Diallo et al [130] proposed a multiagent concurrent DQN algorithm able to converge in a nonstationary environment. Lenient-DQN conceived by Palmer et al [131] utilizes leniency with decaying temperature values for adjusting the policy updates sampled from the experience replay memory to deal with the non-stationarity caused by concurrent learning.…”
Section: ) Multi-agent Reinforcement Learningmentioning
confidence: 99%