In Public Safety Networks (PSNs), the conservation of on-scene device energy is critical to ensure long term connectivity to first responders. Due to the limited transmit power, this connectivity can be ensured by enabling continuous cooperation among on-scene devices through multipath routing. In this paper, we present a Reinforcement Learning (RL) and Unmanned Aerial Vehicle- (UAV) aided multipath routing scheme for PSNs. The aim is to increase network lifetime by improving the Energy Efficiency (EE) of the PSN. First, network configurations are generated by using different clustering schemes. The RL is then applied to configure the routing topology that considers both the immediate energy cost and the total distance cost of the transmission path. The performance of these schemes are analyzed in terms of throughput, energy consumption, number of dead nodes, delay, packet delivery ratio, number of cluster head changes, number of control packets, and EE. The results showed an improvement of approximately 42% in EE of the clustering scheme when compared with non-clustering schemes. Furthermore, the impact of UAV trajectory and the number of UAVs are jointly analyzed by considering various trajectory scenarios around the disaster area. The EE can be further improved by 27% using Two UAVs on Opposite Axis of the building and moving in the Opposite directions (TUOAO) when compared to a single UAV scheme. The result showed that although the number of control packets in both the single and two UAV scenarios are comparable, the total number of CH changes are significantly different.