In this paper, we conducted a performance evaluation of two multiagent reinforcement learning based training methods for swarm fighting, namely, the multiagent reinforcement learning (MARL) training method, and the combined multiagent reinforcement learning and behavior cloning (MARL-BC) training method. The behavior cloning expert is taken from some well-trained model in the final steady phase by the MARL training method. From the perspective of winning rate, the performances of these two different training methods can be divided into three phases. In the first phase, learning progresses slowly for both these two training methods. As the model trained by the MARL training method grows stronger, the experience of the behavior cloning expert gradually becomes useful, and the second phase kicks off where the MARL-BC training method takes obvious advantage. Surprisingly, the advantage of the MARL-BC training method will disappear as the learning progress goes on because in this final phase the expert of the behavior cloning training method can no longer offer the right strategy in presence of the ever changing environment and opponent.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.