Summary
Unmanned aerial vehicle (UAV)‐aided aerial base stations have emerged as a promising technique to provide rapid on‐demand wireless coverage for ground communicating devices in a geographical area. However, existing works on UAV‐enabled wireless communication systems overlook optimal deployment of UAVs under quality of service (QoS)‐aware device‐to‐device (D2D) communication. Therefore, this work proposes a UAV‐supported self‐organized device‐to‐device (USSD2D) network that employs multiple UAVs as relays for reliable D2D data transmission. The aim is to maximize the total instantaneous transmission rate of the USSD2D network by jointly optimizing devices' association with UAV, UAVs' channel selection, and their deployed location under signal‐to‐interference‐noise ratio (SINR) threshold. As this joint optimization problem is nonconvex and combinatorial, the formulated problem is transformed into a Markov decision process (MDP) that effectively splits up it into three individual optimization subproblems: devices association, UAVs' channel selection indicator, and UAVs' location at each instance. Finally, a reinforcement learning (RL) based on a low‐complexity iterative state–action–reward–state–action (SARSA) algorithm is developed to update UAVs' policy to solve this formulated problem. UAVs adopt the system parameters according to the current state and corresponding action to maximize the generated long‐term discounted reward under the current policy without prior knowledge about the environment. Numerical results validate the proposed approaches and provide various insights on optimal UAV deployment. This investigation demonstrates that the total instantaneous transmission rate of the USSD2D network can be improved by 75.25%, 51.31%, and 13.96% with respect to RS‐FORD, ES‐FIRD, and AOIV schemes, respectively.