Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward

Sheikh, Hassam Ullah; Bölöni, Ladislau

doi:10.1109/ijcnn48605.2020.9206879

Cited by 30 publications

(15 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The DE-MADDPG algorithm is an extended version of the MADDPG algorithm, which improves the network architecture of the MADDPG algorithm [33]. The MADDPG algorithm implements centralized training through a global centralized critic network.…”

Section: De-maddpg Approachmentioning

confidence: 99%

See 1 more Smart Citation

Building a Connected Communication Network for UAV Clusters Using DE-MADDPG

et al. 2021

View full text Add to dashboard Cite

Clusters of unmanned aerial vehicles (UAVs) are often used to perform complex tasks. In such clusters, the reliability of the communication network connecting the UAVs is an essential factor in their collective efficiency. Due to the complex wireless environment, however, communication malfunctions within the cluster are likely during the flight of UAVs. In such cases, it is important to control the cluster and rebuild the connected network. The asymmetry of the cluster topology also increases the complexity of the control mechanisms. The traditional control methods based on cluster consistency often rely on the motion information of the neighboring UAVs. The motion information, however, may become unavailable because of the interrupted communications. UAV control algorithms based on deep reinforcement learning have achieved outstanding results in many fields. Here, we propose a cluster control method based on the Decomposed Multi-Agent Deep Deterministic Policy Gradient (DE-MADDPG) to rebuild a communication network for UAV clusters. The DE-MADDPG improves the framework of the traditional multi-agent deep deterministic policy gradient (MADDPG) algorithm by decomposing the reward function. We further introduce the reward reshaping function to facilitate the convergence of the algorithm in sparse reward environments. To address the instability of the state-space in the reinforcement learning framework, we also propose the notion of the virtual leader–follower model. Extensive simulations show that the success rate of the DE-MADDPG is higher than that of the MADDPG algorithm, confirming the effectiveness of the proposed method.

show abstract

Section: De-maddpg Approachmentioning

confidence: 99%

“…where D is the experience replay buffer, Q ϕ i is the local critic network of the agent i with the parameter ϕ i , and Q ψ is the global critic network with the parameter ψ [33].…”

Section: De-maddpg Approachmentioning

confidence: 99%

Building a Connected Communication Network for UAV Clusters Using DE-MADDPG

et al. 2021

View full text Add to dashboard Cite

show abstract

“…RL techniques have been well studied so far and applying those techniques for multi-agent system is a recent open discussion. The objective of state-of-the-art MARL algorithm can be categorized into two [11]. One is to maximize the global reward for the success as a team as it can be found in COMA [10].…”

Section: A Concept Of Rl and Marlmentioning

confidence: 99%

“…In addition, "target policy smoothing" was introduced which adds clipped Gaussian noise to the selected action to avoid overfitting to the narrow peaks in the value estimation due to a concern with deterministic policies. Recently there are many cases which TD3 have been applied for multi-agent system as multi-agent TD3 (MATD3) [18], [11]. [18] has a structure of decentralized actor-critic which is similar to [13] but instead of DDPG, this has TD3 network.…”

Section: B Policy Gradient Algorithmsmentioning

confidence: 99%

“…Recently utilizing this technique for multi-agent system, multi-agent reinforcement learning (MARL), has been extensively studied. MARL can efficiently work on the cooperative task such as multi-robot navigation [8], traffic control [9], team video games [10] and cooperative escort [11]. Using RL for COTM de-pointing could also be a valid approach for the reason mentioned above.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Introduction to UAV swarm utilization for communication on the move terminals tracking evaluation with reinforcement learning technique

Omi

Shin

Tsourdos

et al. 2021

2021 15th European Conference on Antennas and Propagation (EuCAP)

View full text Add to dashboard Cite

As the growth of communication and satellite industry, the demand of satellite antenna evaluation is increasing. Particularly Communication On The Move (COTM) terminal antenna, including electronically steerable antennas (ESA) and for the communication between new constellations on LEO and MEO, requires tracking accuracy test for the communication on moving vehicles. The measurement capability of conventional methodologies have been limited due to their location fixed facilities and non-adjustable sensor's positions during the measurement. To overcome this drawbacks, we will present how multi-agent system of UAVs could be utilized for COTM tracking accuracy evaluation. This measurement needs instant actions for UAVs to keep them navigating in order to achieve accurate and stable measurement. Reinforcement learning (RL) techniques are investigated for this purpose in this paper. The performance improvement is demonstrated with the system using RL technique to adjust UAVs with sensors during the measurement.

show abstract