A flying ad hoc network (FANETs), also known as a swarm of unmanned aerial vehicles (UAVs), can be deployed in a wide range of applications including surveillance, monitoring, and emergency communications. UAVs must perform real-time communication among themselves and the base station via an efficient routing protocol. However, designing an efficient multihop routing protocol for FANETs is challenging due to high mobility, dynamic topology, limited energy, and short transmission range. Recently, owing to the advantages of multi-objective optimization, Q-learning (QL)-based position-aware routing protocols have improved the performance of routing in FANETs. In his article, we provide a comprehensive review of existing QL-based position-aware routing protocols for FANETs. We rigorously address dynamic topology, mobility models, and the relationship between QL and routing in FANETs, and extensively review the existing QL-based position-aware routing protocols along with their advantages and limitations. Then, we compare the reviewed protocols qualitatively in terms of operational features, characteristics, and performance metrics. We also discuss important open issues and research challenges with potential research directions.