2021
DOI: 10.3390/e23091133
|View full text |Cite
|
Sign up to set email alerts
|

Attention-Based Fault-Tolerant Approach for Multi-Agent Reinforcement Learning Systems

Abstract: The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents. Typically, an agent receives its private observations providing a partial view of the true state of the environment. However, in realistic settings, the harsh environment might cause one or more agents to show arbitrarily faulty or malicious behavior, which may suffice to allow the current coordination mechanisms fail. In this paper, we stud… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 21 publications
0
5
0
Order By: Relevance
“…Others use metrics to identify relevant messages, such as attention mechanisms. In its simplest form, this is a vector of importance weights (Peng et al 2018;Gu et al 2021;Mao et al 2020). An alternative is to keep confidence scores about states (Da Silva et al 2017).…”
Section: Communicationmentioning
confidence: 99%
“…Others use metrics to identify relevant messages, such as attention mechanisms. In its simplest form, this is a vector of importance weights (Peng et al 2018;Gu et al 2021;Mao et al 2020). An alternative is to keep confidence scores about states (Da Silva et al 2017).…”
Section: Communicationmentioning
confidence: 99%
“…In order to ensure the policy consistency of this multiagent cooperative routing model, Ref. [50] used the latest multiagent deep deterministic policy gradient (MADDPG) algorithm [51] to train the model. e final experimental results show that the reinforcement learning intelligent routing algorithm based on the offline link weight has better load balancing characteristics than the shortest path routing, that is, the shorter router average waiting time.…”
Section: Intelligent Routing Algorithm Based On Deep Reinforcement Le...mentioning
confidence: 99%
“…However, existing attention-based CTDE MADRL methods allow inter-agent communication only during the centralized training phase [12]- [14], so can be brittle during the distributed execution phase under dynamic environments. Furthermore, for multi-UAV path planning, these methods consider too many [13] or too few attention heads [12], [14]. By conducting numerical experiments, we found that a single attention head can attend to up to one other UAV agent (e.g., the nearest UAV), which may not be sufficient to avoid collision in a congested area (e.g., near the destination).…”
Section: Introductionmentioning
confidence: 99%