Router nodes placement (RNP) is an important issue in the design and implementation of wireless mesh networks (WMN). This is known as an P-hard problem, which cannot be solved using conventional algorithms. Consequently, approximate optimization strategies are commonly used to solve this problem. With heavy node density and wide-area WMNs, solving the RNP problem using approximation algorithms often faces many difficulties, therefore, a more effective solution is necessary. This motivated us to conduct this work. We propose a new method for solving the RNP problem using reinforcement learning (RL). The RNP problem is modeled as an RL model with environment, agent, action, and reward are equivalent to the network system, routers, coordinate adjustment, and connectivity of the RNP problem, respectively. To the best of our knowledge, this is the first study that applies RL to solve the RNP problem. The experimental results showed that the proposed method increased the network connectivity by up to 22.73% compared to the most recent methods.