WSNs have gained importance in the recent years in the fields of industries, automobiles, military, agriculture and health care sectors, and others for data acquisition and monitoring. WSNs are made up of a large number of low-power nodes implemented with requisite sensing, computational, and wireless communication functionalities in relation to other nodes or a BS. LEACH is hierarchical routing protocol that adapts nodes into clusters, with an objective of achieving an equal distribution of energy load. However, the drawback of LEACH is that the selection of cluster head (CH) and the usage of single hop data transmission to the base station (BS) depends on the random probability, which is not efficient in large network. New additions involve the integration of a k-means clustering algorithm and reinforcement learning Q-learning for the selection of CHs and routing of data. For instance, Q-Learning improves the reliability and flexibility of a given network as it allows for acquisition of the best routes that involve multi-hope communication. An assessment of Q-Learning based routing is conducted on WSNs, based on energy depletion rate, node duration, and packet delivery ratio. Theoretical analysis shows that the Q-Learning based algorithm outperforms the traditional algorithms such as LEACH and k-means by adopting better energy utilization, reduced node mortality and high throughput. Overall, this study exposes the ability of Q-Learning algorithm in enhancing WSN life and efficiency in the modern world, and thus could be considered an optimal solution for changeability and limited resources in WSN networks.