Unmanned Aerial Vehicles (UAVs), used in civilian applications such as emergency medical deliveries, precision agriculture, wireless communication provisioning, etc., face the challenge of limited flight time due to their reliance on the on-board battery. Therefore, developing efficient mechanisms for in situ power transfer to recharge UAV batteries holds potential to extend their mission time. In this paper, we study the use of the far-field wireless power transfer (WPT) technique from specialized, transmitter UAVs (tUAVs) carrying Multiple Input Multiple Output (MIMO) antennas for transferring wireless power to receiver UAVs (rUAVs) in a mission. The tUAVs can fly and adjust their distance to the rUAVs to maximize energy transfer gain. The use of MIMO antennas further boosts the energy reception by narrowing the energy beam toward the rUAVs. The complexity of their dynamic operating environment increases with the growing number of tUAVs and rUAVs with varying levels of energy consumption and residual power. We propose an intelligent trajectory selection algorithm for the tUAVs based on a deep reinforcement learning model called Proximal Policy Optimization (PPO) to optimize the energy transfer gain. The simulation results demonstrate that the PPO-based system achieves about a tenfold increase in flight time for a set of realistic transmit power, distance, sub-band number and antenna numbers. Further, PPO outperforms the benchmark movement strategies of “Traveling Salesman Problem” and “Low Battery First” when used by the tUAVs.