Reinforcement learning (RL), which is a class of machine learning, provides a framework by which a system can learn from its previous interactions with its environment to efficiently select its actions in the future. RL has been used in a number of application fields, including game playing, robotics and control, networks, and telecommunications, for building autonomous systems that improve themselves with experience. It is commonly accepted that RL is suitable for solving optimization problems related to distributed systems in general and to routing in networks in particular. RL also has reasonable overheadin terms of control packets, memory and computation-compared to other optimization techniques used to solve the same problems. Since the mid-1990s, over 60 protocols have been proposed, with major or minor contributions in the field of optimal route selection to convey packets in different types of communication networks under various user QoS requirements. This paper provides a comprehensive review of the literature on the topic. The review is structured in a way that shows how network characteristics and requirements were gradually considered over time. Classification criteria are proposed to present and qualitatively compare existing RL-based routing protocols.INDEX TERMS Reinforcement learning, communication networks, routing protocols, path optimization, quality of service.