With the growing recognition of the unique advantages of reinforcement learning and genetic algorithms in addressing combinatorial optimization problems, this study aims to integrate these two methods to collectively tackle the classic combinatorial optimization challenge of the travelling salesman problem (TSP). The TSP stands as a quintessential combinatorial optimization challenge, tasked with determining the shortest path among designated cities. This paper introduces an innovative approach by amalgamating reinforcement learning's path selection prowess with genetic algorithms' global search strategy, aiming to uncover superior solutions in TSP. Specifically, the experiment employs a dual Q‐learning algorithm within reinforcement learning to identify multiple optimal paths, serving as progenitors for the genetic algorithm to further enhance performance. The paper meticulously outlines the problem modelling process, elucidating TSP instance definitions, descriptions, and precise objective function definitions. Experimental findings underscore the substantial enhancements achievable in TSP optimization through this comprehensive approach, offering a fresh perspective and methodology for tackling combinatorial optimization challenges.