In recent years, due to the application of high-definition video codec technology, high-precision satellite navigation technology, mobile base station positioning technology, and broadband technology, the performance of UAVs has been greatly improved. In the military field, drones have become an important weapon alongside missiles on the battlefield. In the future, military drones will perform strategic missions such as battlefield reconnaissance and long-range destruction. Outside the military field, DJI’s Zenmuse series drones are used for filming, MG series drones are used for pesticide spraying, and Beijing Zhonghangzhi unmanned helicopters are used for geological surveys, precise inspection of power lines, and maritime law enforcement. With the continuous improvement of technical specifications, UAV communication technology requires further research and development. This paper has conducted research experiments on the optimization of multi-UAV communication network based on reinforcement learning. The experimental data show that it is marked as the AoI value corresponding to the completion of a certain self-task. It can be seen that the final AoI of the communication trajectory of reinforcement learning is 115, and the AoI greedy strategy finally obtains AoI of 140 seconds, achieving about 18% of the total AoI reduce, which effectively improve the performance of the system. From the above data, the research of reinforcement learning method has great benefits for the development of UAV communication.