Exploiting Partial Common Information Microstructure for Multi-Modal Brain Tumor Segmentation

Mei, Yongsheng; Lan, Tian; Venkataramani, Guru

doi:10.48550/arxiv.2302.02521

Cited by 1 publication

(1 citation statement)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ReMIX [39] provides a factorization weighting scheme to find the optimal projection of an unrestricted mixing function onto monotonic function classes. PAC [69] and LAS-SAC [70] proposes to use latent assisted information [38] as extra-state information for better value factorization. Aside from methods focusing on tackling cooperative problems, other mechanisms can also solve competitive problems or mixed problems.…”

Section: Related Work 31 Marl Algorithmsmentioning

confidence: 99%

MAC-PO: Multi-Agent Experience Replay via Collective Priority Optimization

Mei¹,

Zhou²,

Lan³

et al. 2023

Preprint

View full text Add to dashboard Cite

Experience replay is crucial for off-policy reinforcement learning (RL) methods. By remembering and reusing the experiences from past different policies, experience replay significantly improves the training efficiency and stability of RL algorithms. Many decisionmaking problems in practice naturally involve multiple agents and require multi-agent reinforcement learning (MARL) under centralized training decentralized execution paradigm. Nevertheless, existing MARL algorithms often adopt standard experience replay where the transitions are uniformly sampled regardless of their importance. Finding prioritized sampling weights that are optimized for MARL experience replay has yet to be explored. To this end, we propose MAC-PO, which formulates optimal prioritized experience replay for multi-agent problems as a regret minimization over the sampling weights of transitions. Such optimization is relaxed and solved using the Lagrangian multiplier approach to obtain the close-form optimal sampling weights. By minimizing the resulting policy regret, we can narrow the gap between the current policy and a nominal optimal policy, thus acquiring an improved prioritization scheme for multi-agent tasks. Our experimental results on Predator-Prey and StarCraft Multi-Agent Challenge environments demonstrate the effectiveness of our method, having a better ability to replay important transitions and outperforming other state-of-the-art baselines.

show abstract

Section: Related Work 31 Marl Algorithmsmentioning

confidence: 99%