Unmanned aerial vehicle (UAV) is regarded as an effective technology in future wireless networks. However, due to the non-convexity feature of joint trajectory design and power allocation (JTDPA) issue, it is challenging to attain the optimal joint policy in multi-UAV networks. In this paper, a multi-agent deep reinforcement learning-based approach is presented to achieve the maximum long-term network utility while satisfying the user equipments' quality of service requirements. Moreover, considering that the utility of each UAV is determined based on the network environment and other UAVs' actions, the JTDPA problem is modeled as a stochastic game. Due to the high computational complexity caused by the continuous action space and large state space, a multi-agent deep deterministic policy gradient method is proposed to obtain the optimal policy for the JTDPA issue. Numerical results indicate that our method can obtain the higher network utility and system capacity than other optimization methods in multi-UAV networks with lower computational complexity. INDEX TERMS UAV networks, trajectory design, power allocation, multi-agent deep reinforcement learning.
Traffic offloading is considered to be a promising technology in the Unmanned Aerial Vehicles- (UAVs-) assisted cellular networks. Due to their selfishness property, UAVs may be reluctant to take part in traffic offloading without any incentive. Moreover, considering the dynamic position of UAVs and the dynamic condition of the transmission channel, it is challenging to design a long-term effective incentive mechanism for multi-UAV networks. In this work, the dynamic contract incentive approach is studied to attract UAVs to participate in traffic offloading effectively. The two-stage contract incentive method is introduced under the information symmetric scenario and the information asymmetric scenario. Considering the sufficient conditions and necessary conditions in the contract design, a sequence optimization algorithm is investigated to acquire the maximum expected utility of the base station. The simulation experiment shows that the designed two-stage dynamic contract improves the performance of traffic offloading effectively.
Heterogeneous networks (HetNets) can equalize traffic loads and cut down the cost of deploying cells. Thus, it is regarded to be the significant technique of the next-generation communication networks. Due to the non-convexity nature of the channel allocation problem in HetNets, it is difficult to design an optimal approach for allocating channels. To ensure the user quality of service as well as the long-term total network utility, this article proposes a new method through utilizing multi-agent reinforcement learning. Moreover, for the purpose of solving computational complexity problem caused by the large action space, deep reinforcement learning is put forward to learn optimal policy. A nearly-optimal solution with high efficiency and rapid convergence speed could be obtained by this learning method. Simulation results reveal that this new method has the best performance than other methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.