Offline reinforcement learning leverages previously-collected offline datasets to learn optimal policies with no necessity to access the real environment. Such a paradigm is also desirable for multi-agent reinforcement learning (MARL) tasks, given the increased interactions among agents and with the enviroment. Yet, in MARL, the paradigm of offline pre-training with online fine-tuning has not been studied, nor datasets or benchmarks for offline MARL research are available. In this paper, we facilitate the research by providing large-scale datasets, and use it to examine the usage of the Decision Transformer in the context of MARL. We investigate the generatlisation of MARL offline pre-training in the following three aspects: 1) between single agents and multiple agents, 2) from offline pretraining to the online fine tuning, and 3) to that of multiple downstream tasks with few-shot and zero-shot capabilities. We start by introducing the first offline MARL dataset with diverse quality levels based on the StarCraftII environment, and then propose the novel architecture of multi-agent decision transformer (MADT) for effective offline learning. MADT leverages transformer's modelling ability of sequence modelling and integrates it seamlessly with both offline and online MARL tasks. A crucial benefit of MADT is that it learns generalisable policies that can transfer between different types of agents under different task scenarios. On StarCraft II offline dataset, MADT outperforms the state-of-the-art offline RL baselines. When applied to online tasks, the pre-trained MADT significantly improves sample efficiency, and enjoys strong performance both few-short and zero-shot cases. To our best knowledge, this is the first work that studies and demonstrates the effectiveness of offline pre-trained models in terms of sample efficiency and generalisability enhancements in MARL.
We investigate the coherent control of the transmission spectrum in a cavity magnetomechanical system consisting of microwave photon, magnon, and phonon modes, where the microwave cavity is driven by a strong pump field and a weak probe field, and the magnon is driven by a weak microwave source. Different from a single transparency window in the absence of the phonon–magnon interaction, two transparency windows and three absorption dips can be observed in the presence of the phonon–magnon interaction, which originates from the joint interaction of phonon–magnon and photon–magnon. In addition, two absorption dips located at both sides of the central absorption dip can be modulated asymmetrically into amplification and absorption by varying the magnetic field amplitude of the magnon driving field. Interestingly enough, the relative phase of applied fields could have profound effects on both the transmission spectrum and the group delay of the output field by choosing the appropriate magnetic field amplitude of the magnon driving field. The transmission group delay can be switched between positive to negative and vice versa by adjusting the relative phase between the applied fields. The present results illustrate the potential to utilize the relative phase for controlling the microwave signal in the cavity magnomechanical system, as well as guidance in the design of information transduction and quantum sensing.
We propose a scheme to realize a single-photon diode and circulator using two waveguides chirally coupled to a five-level M-type atom. Two external control fields are introduced to drive the emitter. Non-reciprocal single-photon propagation can be completed in our scheme, which underpins the single-photon diode and circulator. The single-photon diode can work well at special frequency points of the incident photon simultaneously. We can modulate the Rabi frequencies and detunings of the control fields to satisfy various values of frequency points of the incident photon. In addition, the non-reciprocity of the photon propagation can be turned off or on when the control fields are turned off or on. A single-frequency filter can filter out a special frequency for the incident photon by adjusting the detunings of the control fields. The single-photon circulator can realize photon propagation in the pathways 1 → 2, 2 → 3, 3 → 4, 4 → 1 with probability 100%. The properties of the circulator can be modulated by adjusting the Rabi frequencies and detunings of the control fields.
We propose a potentially practical scheme for the controllable single-photon transport via waveguides which are coupled to a microcavity-emitter system. The microcavity-emitter system consists of a V-type three-level emitter and two or one single-mode microcavity. A driving field is used to drive a hyperfine transition between two upper excited states of the V-type three-level emitter. Beyond chiral coupling between waveguides and microcavity-emitter system, we show that the perfectly nonreciprocal single-photon transport in a single waveguide and the single-photon router with 100% routing probability in two waveguides can be achieved. Interesting enough, whether the nonreciprocal single-photon transport or the single-photon router can be switched periodically by adjusting the phase associated with microcavity-emitter coupling strength and the driving field. The complete physical explanation of the underlying mechanism is presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.