This work proposes a novel reptile search algorithm (RSA) to solve optimization problems called reinforcement reptile search algorithm (RLRSA). The basic RSA performs exploitation through highly walking in the first half of searching process while the exploration phase is executed through the hunting phase in the second half. Therefore, the algorithm is not able to balance exploration and exploitation and this behavior results in trapping in local optima. A novel learning method based on reinforcement learning and Q-learning model is proposed to balance the exploitation and exploration phases when the solution starts deteriorating. Furthermore, the random opposite-based learning (ROBL) is introduced to increase the diversity of the population and so enhance the obtained solutions. Twenty-three typical benchmark functions, including unimodal, multimodal and fixed-dimension multimodal functions, were employed to assess the performance of RLRSA. According to the findings, the RLRSA method surpasses the standard RSA approach in the majority of benchmark functions evaluated, specifically in 12 out of 13 unimodal functions, 9 out of 13 multimodal functions, and 8 out of 10 fixed multimodal functions. Furthermore, the RLRSA is applied to vessel solve pressure and tension/compression spring design problems. The results show that RLRSA significantly found the solution with minimum cost. The experimental results reveal the superiority of the RLRSA compared to RSA and other optimization methods in the literature.