The exponential rise in the demands of the wireless communication system has alarmed industries to achieve more efficient and quality‐of‐service (QoS) centric wireless communication networks. The decentralised and infrastructure‐less nature of wireless sensor networks (WSNs) enable it to be one of the most sought and used wireless network globally. Its cost‐efficiency and functional robustness towards low‐power lossy networks make it suitable for internet‐of‐things (IoT) applications. In recent years, IoT technologies have been used in diverse applications, including Smart City Planning and Management (SCPM). Although, mobile‐WSN has played a decisive role in IoT enabled SCPM, its routing optimality and power transmission have always remained challenging. Noticeably, major existing researches address mainly on routing optimisation and very few efforts are made towards dynamic power management (DPM) under non‐linear network conditions. With this motive, in this study, a highly robust and efficient QoS – centric reinforcement learning‐based DPM model has been developed for mobile‐WSN to be used in SCPM. Unlike classical reinforcement learning methods, the authors’ proposed advanced reinforcement learning‐based DPM model exploits both known and unknown network parameters and state‐activity values, including bit‐error probability, channel state information, holding time, buffer cost etc. to perform dynamic switching decision. The key objective of the proposed model is to ensure optimal QoS oriented DPM and adaptive switching control to yield reliable transmission with the maximum possible resource utilisation. To achieve it, they proposed model has been developed as a controlled‐Markov decision problem by applying hidden Markov model it obtains known and unknown parameters, which are subsequently learnt using an enhanced reinforcement learning to yield maximum resource utilisation while maintaining low buffer cost, holding cost and bit‐error probability to retain the QoS provision.