Recent advances in multi-agent reinforcement learning have been largely limited training one model from scratch for every new task. This limitation occurs due to the restriction of the model architecture related to fixed input and output dimensions, which hinder the experience accumulation and transfer of the learned agent over tasks across diverse levels of difficulty (e.g. 3 vs 3 or 5 vs 6 multiagent games). In this paper, we make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing a single architecture to fit tasks with different observation and action configuration requirements. Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy by decoupling the policy distribution from the intertwined input observation, using an importance weight determined with the aid of the selfattention mechanism. Compared to a standard transformer block, the proposed model, which we name Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable. UPDeT is general enough to be plugged into any multiagent reinforcement learning pipeline and equip it with strong generalization abilities that enable multiple tasks to be handled at a time. Extensive experiments on large-scale SMAC multi-agent competitive games demonstrate that the proposed UPDeT-based multi-agent reinforcement learning achieves significant improvements relative to state-of-the-art approaches, demonstrating advantageous transfer capability in terms of both performance and training speed (10 times faster). Code is available at https://github.com/hhhusiyi-monash/UPDeT
In this paper, we focus on the task of multi-view multi-source geolocalization, which serves as an important auxiliary method of GPS positioning by matching drone-view image and satellite-view image with pre-annotated GPS tag. To solve this problem, most existing methods adopt metric loss with an weighted classification block to force the generation of common feature space shared by different view points and view sources. However, these methods fail to pay sufficient attention to spatial information (especially viewpoint variances). To address this drawback, we propose an elegant orientation-based method to align the patterns and introduce a new branch to extract aligned partial feature. Moreover, we provide a style alignment strategy to reduce the variance in image style and enhance the feature unification. To demonstrate the performance of the proposed approach, we conduct extensive experiments on the large-scale benchmark dataset. The experimental results confirm the superiority of the proposed approach compared to state-of-the-art alternatives.
We present a Transformer-like agent to learn the policy of multi-agent cooperation tasks, which is a breakthrough for traditional RNN-based multi-agent models that need to be retrained for each task. Our model can handle various input and output with strong transferability and can parallel tackle different tasks. Besides, We are the first to successfully utilize transformer into a recurrent architecture, providing insight on stabilizing transformers in recurrent RL tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.