Global Oceans 2020: Singapore – U.S. Gulf Coast 2020
DOI: 10.1109/ieeeconf38699.2020.9389128
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Agent Reinforcement Learning for Dynamic Ocean Monitoring by a Swarm of Buoys

Abstract: Multi-agent pursuit-evasion tasks involving intelligent targets are notoriously challenging coordination problems. In this paper, we investigate new ways to learn such coordinated behaviors of unmanned aerial vehicles (UAVs) aimed at keeping track of multiple evasive targets. Within a Multi-Agent Reinforcement Learning (MARL) framework, we specifically propose a variant of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) method. Our approach addresses multitarget pursuit-evasion scenarios within non… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 19 publications
(11 citation statements)
references
References 30 publications
0
11
0
Order By: Relevance
“…To this end, we expect designers to train their agents in gradually more open environments using multi-agent reinforcement learning (MARL), allowing for the effects of dynamic environmental factors to be included in the training process. Indeed, several research groups have already started using MARL techniques to develop policies for their swarming agents in dynamic environments ( Kouzehgar et al., 2020 ; Wang et al., 2022a ; Wang et al., 2022b ; Kouzeghar et al., 2023 ). Learning from demonstration, experience replay, and transfer learning offer promising opportunities to exploit prior knowledge, e.g., from another domain or task ( Karimpanal and Bouffanais, 2018 ; Karimpanal and Bouffanais, 2019 ).…”
Section: Discussionmentioning
confidence: 99%
“…To this end, we expect designers to train their agents in gradually more open environments using multi-agent reinforcement learning (MARL), allowing for the effects of dynamic environmental factors to be included in the training process. Indeed, several research groups have already started using MARL techniques to develop policies for their swarming agents in dynamic environments ( Kouzehgar et al., 2020 ; Wang et al., 2022a ; Wang et al., 2022b ; Kouzeghar et al., 2023 ). Learning from demonstration, experience replay, and transfer learning offer promising opportunities to exploit prior knowledge, e.g., from another domain or task ( Karimpanal and Bouffanais, 2018 ; Karimpanal and Bouffanais, 2019 ).…”
Section: Discussionmentioning
confidence: 99%
“…In our view, a true swarm should exhibit flexibility and adaptability in vastly different scenarios. Although, seeking swarm intelligence in its most general aspect is a laudable objective, it is still an extremely challenging task in practice, and multi-agent reinforcement learning (MARL) is attempting to do just that (Kouzehgar et al, 2020;Leonardos and Piliouras, 2021). Nonetheless, the design of a benchmark problem would offer the possibility to quantitatively compare the various approaches considered for this problem of exploration-exploitation balance.…”
Section: Discussionmentioning
confidence: 99%
“…Policy Gradients DDPG PPO Other [23,24,39,40,59, (QMIX), [38] (DDQN), [93] (DDQN), [94] (DDQN), [95] (DQN), [96] (DQN) [46][47][48][49][50][97][98][99][100][101][102][103][104][105][106], [22,[52][53][54][55][56][57] [86,88,129-140] (TRPO), [141] (TRPO), [81] (TRPO), [142] (TD3), [143] (SAC), [144] (SAC)…”
Section: Q-networkmentioning
confidence: 99%
“…Not only with the ground and aerial vehicles, but MADRL has also been used for ocean monitoring with a team of floating buoys as well. Kouzehgar et al [ 105 ] proposed two area coverage approaches for such monitoring: (1) swarm-based (i.e., the robots follow simple swarming rules [ 164 ]) and (2) coverage-range-based (i.e., the robots with fixed sensing radius). The swarm-based model was trained using MADDPG and the latter model MARL was trained using a modified (consisting of eliminating reward sharing, collective reward, sensing their own share of the reward function, and independence based on individual reward) MADDPG algorithm.…”
Section: Multi-robot System Applications Of Multi-agent Deep Reinforc...mentioning
confidence: 99%