“…The main practical motivation and application for the deployment of base stations in an ideal environment is to construct a communication network in the air to provide wireless signal coverage for the fire field [ 19 , 20 ]. In recent years, many scholars have applied machine-learning methods to solve control and path planning problems in multi-agent systems, such as deep learning, reinforcement learning, and deep reinforcement learning, and have achieved some success [ 21 , 22 , 23 ]. However, there are still many big challenges in solving the control problem of multi-agent systems in complex unknown closed environments, such as high-rise building fire fields.…”