Reinforcement learning, specifically Q-learning, has gained a plethora of attention from researchers in recent decades due to its remarkable performance in various applications. This study proposes a novel Reinforcement Learning-inspired Tunicate Swarm Algorithm (RLTSA) that employs a Qlearning approach to enhance the convergence accuracy and local search efficacy of tunicates in TSA while preventing their local optimal entrapment. Firstly, a novel Chaotic Quasi Reflection Based Learning (CQRBL) strategy with ten chaotic maps is proposed to improve convergence reliability. Then, Q-learning is introduced and embedded with TSA by dynamically switching the learning mechanisms of CQRBL and ROBL strategies at different stages for distinct problems. These two strategies in the Q-learning approach significantly improve the efficiency of the proposed algorithm. The performance of RLTSA is evaluated on a set of 33 distinct functions, including the CEC'05 and CEC'19 test functions, as well as four engineering design problems, and its outcomes are statistically and graphically tested against the TSA and seven other eminent meta-heuristics. In addition, statistical tests, notably the Friedman, Wilcoxon rank-sum, and t-tests, have been employed to exemplify the dominance of the RLTSA. The experimental findings disclose that RLTSA outperforms the competing algorithms in the realm of real-world engineering design problems.