2022
DOI: 10.1016/j.future.2022.06.015
|View full text |Cite
|
Sign up to set email alerts
|

Target localization using Multi-Agent Deep Reinforcement Learning with Proximal Policy Optimization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 34 publications
(8 citation statements)
references
References 25 publications
0
5
0
Order By: Relevance
“…It is the average of the total distance to obstacles, angular jerk, linear jerk, and lane center offset for each episode 4) Rules It refers to the total number of rules violated such as lane changing, Wrong Way, Speed Overlimit for each episode. It is of importance to note that since we are taking the average for each episode, there can be multiple agents (red cars) in a 6 Difference given here: https://stats.stackexchange.com/questions/184657/ what-is-the-difference-between-off-policy-and-on-policy-learning 7 Blog Post for introduction to Q-Learning: https://medium.com/intro-to-artificial-intelligence/ q-learning-a-value-based-reinforcement-learning-algorithm-272706d835cf single episode. This leads to higher values in our evaluation and similar is the case with all other participating teams in the competition.…”
Section: Results and Comparative Analysismentioning
confidence: 99%
See 2 more Smart Citations
“…It is the average of the total distance to obstacles, angular jerk, linear jerk, and lane center offset for each episode 4) Rules It refers to the total number of rules violated such as lane changing, Wrong Way, Speed Overlimit for each episode. It is of importance to note that since we are taking the average for each episode, there can be multiple agents (red cars) in a 6 Difference given here: https://stats.stackexchange.com/questions/184657/ what-is-the-difference-between-off-policy-and-on-policy-learning 7 Blog Post for introduction to Q-Learning: https://medium.com/intro-to-artificial-intelligence/ q-learning-a-value-based-reinforcement-learning-algorithm-272706d835cf single episode. This leads to higher values in our evaluation and similar is the case with all other participating teams in the competition.…”
Section: Results and Comparative Analysismentioning
confidence: 99%
“…Further, an adaptive clipping approach for PPO [5] was developed later by building on prior works. After the inception of PPO, there have been various cooperative and Multi-Agent Proximal Policy Optimization implementations for various use cases such as targeted localization [6], Online scheduling for Production [7], Health-care [8]. Moreover, there has been a combination of Convolutions with RL approaches such as CMAPPO (Convolutional Multi-Agent PPO) [9] which learn an objective based on learning and exploring a new environment most effectively by combining various domains such as Convolutions (for RGBD+ information), Curriculum-based Learning, and Motivation-based Reinforcement Learning.…”
Section: A Multi-agent Proximal Policy Optimizationmentioning
confidence: 99%
See 1 more Smart Citation
“…[16], BGGIW is adopted to approximate target birth intensity and potential target intensity. There are also some other target tracking methods, such as energy-based auto regressive neural system [21,22], deep learning strategy [23,24], reinforcement learning [25][26][27] and genetic algorithm [28].…”
Section: Related Workmentioning
confidence: 99%
“…RL algorithms are slow to converge, where most of the time is spent on exploration at the early stages of learning. There are multiple learning speedup techniques for RL such as offline learning, dynamic exploration, transfer learning, imitation learning, and reward shaping [23,1]. Reward shaping alters the original reward function with values generated from a shaping function.…”
Section: Introductionmentioning
confidence: 99%