The JAYA algorithm is a population-based meta-heuristic algorithm proposed in recent years which has been proved to be suitable for solving global optimization and engineering optimization problems because of its simplicity, easy implementation, and guiding characteristic of striving for the best and avoiding the worst. In this study, an improved discrete JAYA algorithm based on reinforcement learning and simulated annealing (QSA-DJAYA) is proposed to solve the well-known traveling salesman problem in combinatorial optimization. More specially, firstly, the basic Q-learning algorithm in reinforcement learning is embedded into the proposed algorithm such that it can choose the most promising transformation operator for the current state to update the solution. Secondly, in order to balance the exploration and exploitation capabilities of the QSA-DJAYA algorithm, the Metropolis acceptance criterion of the simulated annealing algorithm is introduced to determine whether to accept candidate solutions. Thirdly, 3-opt is applied to the best solution of the current iteration at a certain frequency to improve the efficiency of the algorithm. Finally, to evaluate the performance of the QSA-DJAYA algorithm, it has been tested on 21 benchmark datasets taken from TSPLIB and compared with other competitive algorithms in two groups of comparative experiments. The experimental and the statistical significance test results show that the QSA-DJAYA algorithm achieves significantly better results in most instances.