Ad hoc vehicular networks have been identified as a suitable technology for intelligent communication amongst smart city stakeholders as the intelligent transportation system has progressed. However, in a highly mobile area, the growing usage of wireless technologies creates a challenging context. To increase communication reliability in this environment, it is necessary to use intelligent tools to solve the routing problem to create a more stable communication system. Reinforcement Learning (RL) is an excellent tool to solve this problem. We propose creating a complex objective space with geo-positioning information of vehicles, propagation signal strength, and environmental path loss with obstacles (city map, with buildings) to train our model and get the best route based on route stability and hop number. The obtained results show significant improvement in the routes’ strength compared with traditional communication protocols and even with other RL tools when only one parameter is used for decision making.