Flying ad hoc network (FANET) is an application of 5G access network, which consists of unmanned aerial vehicles or flying nodes with scarce resources and high mobility rates. This paper proposes a deep Q-network (DQN)-based vertical routing scheme to select routes with higher residual energy levels and lower mobility rates across network planes (i.e., macro-plane, pico-plane, and femto-plane), which has not been investigated in the literature. The main motivation behind this work is to address frequent link disconnections and network partitions in order to enhance network performance. The 5G access network has a central controller (CC) and distributed controllers (DCs) in different network planes. The proposed scheme is a hybrid approach that allows CC and DCs to exchange information among themselves, and handle global and local information, respectively. The proposed scheme is suitable for highly dynamic ad hoc FANETs, and it enables data communication between UAVs in various applications, such as monitoring and performing surveillance of borders, and targeted-based operations (e.g., object tracking). Vertical routing is performed over a clustered network, in which clusters are formed across different network planes to provide inter-plane and inter-cluster communications. This helps to offload data traffic across different network planes to enhance network lifetime. Compared to the traditional reinforcement learning approach, the proposed DQN-based vertical routing scheme has shown to increase network lifetime by up to 60%, reduce energy consumption by up to 20%, and reduce the rate of link breakages by up to 50%.