Internet of Things (IoT) devices are increasingly popular due to their wide array of application domains. In IoT networks, sensor nodes are often connected in the form of a mesh topology and deployed in large numbers. Managing these resource-constrained small devices is complex and can lead to high system costs. A number of standardized protocols have been developed to handle the operation of these devices. For example, in the network layer, these small devices cannot run traditional routing mechanisms that require large computing powers and overheads. Instead, routing protocols specifically designed for IoT devices, such as the routing protocol for low-power and lossy networks, provide a more suitable and simple routing mechanism. However, they incur high overheads as the network expands. Meanwhile, reinforcement learning (RL) has proven to be one of the most effective solutions for decision making. RL holds significant potential for its application in IoT device’s communication-related decision making, with the goal of improving performance. In this paper, we explore RL’s potential in IoT devices and discuss a theoretical framework in the context of network layers to stimulate further research. The open issues and challenges are analyzed and discussed in the context of RL and IoT networks for further study.