Vehicular and flying ad hoc networks (VANETs and FANETs) are becoming increasingly important with the development of smart cities and intelligent transportation systems (ITSs). The high mobility of nodes in these networks leads to frequent link breaks, which complicates the discovery of optimal route from source to destination and degrades network performance. One way to overcome this problem is to use machine learning (ML) in the routing process, and the most promising among different ML types is reinforcement learning (RL). Although there are several surveys on RL-based routing protocols for VANETs and FANETs, an important issue of integrating RL with well-established modern technologies, such as software-defined networking (SDN) or blockchain, has not been adequately addressed, especially when used in complex ITSs. In this paper, we focus on performing a comprehensive categorisation of RL-based routing protocols for both network types, having in mind their simultaneous use and the inclusion with other technologies. A detailed comparative analysis of protocols is carried out based on different factors that influence the reward function in RL and the consequences they have on network performance. Also, the key advantages and limitations of RL-based routing are discussed in detail.