2019 IEEE International Conference on Industrial Technology (ICIT) 2019
DOI: 10.1109/icit.2019.8755032
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Agent Deep Reinforcement Learning with Human Strategies

Abstract: Deep learning has enabled traditional reinforcement learning methods to deal with highdimensional problems. However, one of the disadvantages of deep reinforcement learning methods is the limited exploration capacity of learning agents. In this paper, we introduce an approach that integrates human strategies to increase the exploration capacity of multiple deep reinforcement learning agents. We also report the development of our own multiagent environment called Multiple Tank Defence to simulate the proposed a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 17 publications
(9 citation statements)
references
References 15 publications
0
9
0
Order By: Relevance
“…Establishing communication channels among agents during learning is an important step in designing and constructing MADRL algorithms. Nguyen et al [82] characterized the communication channel via human knowledge represented by images and allow deep RL agents to communicate using these shared images. The asynchronous advantage actor-critic (A3C) algorithm [74] is used to learn optimal policy for each agent, which can be extended to multiple heterogeneous agents.…”
Section: Madrl Applicationsmentioning
confidence: 99%
See 1 more Smart Citation
“…Establishing communication channels among agents during learning is an important step in designing and constructing MADRL algorithms. Nguyen et al [82] characterized the communication channel via human knowledge represented by images and allow deep RL agents to communicate using these shared images. The asynchronous advantage actor-critic (A3C) algorithm [74] is used to learn optimal policy for each agent, which can be extended to multiple heterogeneous agents.…”
Section: Madrl Applicationsmentioning
confidence: 99%
“…These pose important research questions towards extensions of imitation learning and inverse RL to MADRL methods. In addition, for complicated tasks or behaviors which are difficult for humans to demonstrate, there is a need of alternative methods that allow human preferences to be integrated into deep RL [13,81,82].…”
Section: Conclusion and Research Directionsmentioning
confidence: 99%
“…As a result, it is possible to infer the trajectories into a dynamic environment based on the conditional distribution. Reinforcement learning (RL) has become a promising approach to modeling an autonomous agent [24]- [28]. RL has the abilities to mimic human learning behaviors to maximize the long-term reward.…”
Section: Related Workmentioning
confidence: 99%
“…The action policy is used to describe the agent's behavior, which specifies the way in which the agent chooses the action from a state. If the action policy, h = X → U, does not change over time it is considered stationary [20].…”
Section: Single Agent Casementioning
confidence: 99%