Target localization using Multi-Agent Deep Reinforcement Learning with Proximal Policy Optimization

Alagha, Ahmed; Singh, Shakti; Mizouni, Rabeb; Otrok, Hadi

doi:10.1016/j.future.2022.06.015

Cited by 34 publications

(8 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is the average of the total distance to obstacles, angular jerk, linear jerk, and lane center offset for each episode 4) Rules It refers to the total number of rules violated such as lane changing, Wrong Way, Speed Overlimit for each episode. It is of importance to note that since we are taking the average for each episode, there can be multiple agents (red cars) in a 6 Difference given here: https://stats.stackexchange.com/questions/184657/ what-is-the-difference-between-off-policy-and-on-policy-learning 7 Blog Post for introduction to Q-Learning: https://medium.com/intro-to-artificial-intelligence/ q-learning-a-value-based-reinforcement-learning-algorithm-272706d835cf single episode. This leads to higher values in our evaluation and similar is the case with all other participating teams in the competition.…”

Section: Results and Comparative Analysismentioning

confidence: 99%

“…Further, an adaptive clipping approach for PPO [5] was developed later by building on prior works. After the inception of PPO, there have been various cooperative and Multi-Agent Proximal Policy Optimization implementations for various use cases such as targeted localization [6], Online scheduling for Production [7], Health-care [8]. Moreover, there has been a combination of Convolutions with RL approaches such as CMAPPO (Convolutional Multi-Agent PPO) [9] which learn an objective based on learning and exploring a new environment most effectively by combining various domains such as Convolutions (for RGBD+ information), Curriculum-based Learning, and Motivation-based Reinforcement Learning.…”

Section: A Multi-agent Proximal Policy Optimizationmentioning

confidence: 99%

“…4: Schematic Diagram for MADDPG taken from the research here [14] multiple agents which interact with each other in a cooperative environment. It is an off-policy RL approach as opposed to MAPPO as it learns the value of the optimal policy independent of the agent's actions 6 . Before discussing the term MAPPO, it should be interesting to get some background for Q-learning since DDPG is based on Q-learning 7 .…”

Section: B Multi-agent Deep Deterministic Policy Gradients (With Prio...mentioning

confidence: 99%

See 2 more Smart Citations

On Multi-Agent Deep Deterministic Policy Gradients and their Explainability for SMARTS Environment

Mittal¹,

Malte²

2023

Preprint

View full text Add to dashboard Cite

Multi-Agent RL or MARL is one of the complex problems in Autonomous Driving literature that hampers the release of fully-autonomous vehicles today. Several simulators have been in iteration after their inception to mitigate the problem of complex scenarios with multiple agents in Autonomous Driving. One such simulator-SMARTS, discusses the importance of cooperative multi-agent learning. For this problem, we discuss two approaches-MAPPO and MADDPG, which are based onpolicy and off-policy RL approaches. We compare our results with the state-of-the-art results for this challenge and discuss the potential areas of improvement while discussing the explainability of these approaches in conjunction with waypoints in the SMARTS environment.

show abstract

Section: Results and Comparative Analysismentioning

confidence: 99%

Section: A Multi-agent Proximal Policy Optimizationmentioning

confidence: 99%

Section: B Multi-agent Deep Deterministic Policy Gradients (With Prio...mentioning

confidence: 99%

See 1 more Smart Citation

On Multi-Agent Deep Deterministic Policy Gradients and their Explainability for SMARTS Environment

Mittal¹,

Malte²

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…[16], BGGIW is adopted to approximate target birth intensity and potential target intensity. There are also some other target tracking methods, such as energy-based auto regressive neural system [21,22], deep learning strategy [23,24], reinforcement learning [25][26][27] and genetic algorithm [28].…”

Section: Related Workmentioning

confidence: 99%

Generalised covariance intersection‐Gamma Gaussian Inverse Wishart‐Poisson multi‐Bernoulli Mixture: An intelligent multiple extended target tracking scheme for mobile aquaculture sensor networks

Lv,

Zhu,

Peng

2024

IET Wireless Sensor Systems

View full text Add to dashboard Cite

Poisson multi‐Bernoulli Mixture (PMBM) filter has been known as an available or practical point and multiple extended target tracking (METT) method. The authors present an improved PMBM filter with adaptive detection probability and adaptive newborn distributions, accompanying with an associated distributed fusion strategy for the tracking extended multiple targets. First, the augmented state of unknown and changing target detection probability is assumed as Gamma (GAM) distribution. Second, extended states are described by Inverse Wishart (IW) distribution based on this augmented state, accompanying with dynamic states presented by Gaussian distribution. And then, an adaptive newborn distribution is adopted to describe the newborn targets appearing arbitrarily. Consequently, the closed‐form solutions of the proposed filter can be derived by approximating the intensity of newborn and potential targets to the Gamma Gaussian Inverse Wishart (GGIW) form. Moreover, the fused means that Generalised Covariance Intersection (GCI) is performed in such a large‐scale aquaculture sensor network. Experiments are presented to verify the availability of the GCI‐GGIW‐PMBM method, and comparisons with other METT filters also demonstrate that tracking behaviours are improved largely.

show abstract

“…RL algorithms are slow to converge, where most of the time is spent on exploration at the early stages of learning. There are multiple learning speedup techniques for RL such as offline learning, dynamic exploration, transfer learning, imitation learning, and reward shaping [23,1]. Reward shaping alters the original reward function with values generated from a shaping function.…”

Section: Introductionmentioning

confidence: 99%

Reward Shaping Using Convolutional Neural Network

Sami¹,

Otrok²,

Mourad³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper, we propose Value Iteration Network for Reward Shaping (VIN-RS), a potential-based reward shaping mechanism using Convolutional Neural Network (CNN). The proposed VIN-RS embeds a CNN trained on computed labels using the message passing mechanism of the Hidden Markov Model. The CNN processes images or graphs of the environment to predict the shaping values. Recent work on reward shaping still has limitations towards training on a representation of the Markov Decision Process (MDP) and building an estimate of the transition matrix. The advantage of VIN-RS is to construct an effective potential function from an estimated MDP while automatically inferring the environment transition matrix. The proposed VIN-RS estimates the transition matrix through a self-learned convolution filter while extracting environment details from the input frames or sampled graphs. Due to (1) the previous success of using message passing for reward shaping; and (2) the CNN planning behavior, we use these messages to train the CNN of VIN-RS. Experiments are performed on tabular games, Atari 2600 and MuJoCo, for discrete and continuous action space. Our results illustrate promising improvements in the learning speed and maximum cumulative reward compared to the state-of-the-art.

show abstract

Target localization using Multi-Agent Deep Reinforcement Learning with Proximal Policy Optimization

Cited by 34 publications

References 25 publications

On Multi-Agent Deep Deterministic Policy Gradients and their Explainability for SMARTS Environment

On Multi-Agent Deep Deterministic Policy Gradients and their Explainability for SMARTS Environment

Generalised covariance intersection‐Gamma Gaussian Inverse Wishart‐Poisson multi‐Bernoulli Mixture: An intelligent multiple extended target tracking scheme for mobile aquaculture sensor networks

Reward Shaping Using Convolutional Neural Network

Contact Info

Product

Resources

About