Internet of Things (IoT) is gaining popularity due to its complex network architecture, formed by the tremendous connection of objects. Sensors used in different IoT applications are installed in unfavorable terrains and conditions. Since each sensor node can sense, compute, and promote wireless communication, a novel intelligent routing algorithm is required, as the traditional ones do not fulfill the current network requirements. Reinforcement learning models can help overcome the wireless network's challenges faced during routing due to its dynamicity by selecting and adapting weights that optimize the paths based on the requirement of the applications and operating conditions. In this article, a routing agent with Q‐learning is proposed that adjusts the routing policy of a network based on local information to converge toward an optimal solution by maintaining the overall balance between latency and the network's lifetime. A reward is given to an agent that increases the network lifetime and reduces the average network latency. The evaluation of the proposed model was done using network simulators (NS‐2) on different network scenarios that showed improved results in terms of network lifetime compared to centralized minimum angle and distributed minimum angle.