We propose a new method to enhance stochastic resonance based on reinforcement learning , which does not require a priori knowledge of the underlying dynamics. The reward function of the reinforcement learning algorithm is determined by introducing a moving signal-to-noise ratio, which promptly quantifies the ratio of signal power to noise power by updating time series with a fixed length. To maximize the cumulative reward, the reward function can guide the actions to enhance the signal-to-noise ratio of systems as largely as possible with the help of the moving signal-to-noise ratio. Since the occurrence of the spike of excitable systems, which requires the systems to evolve for some time, should be considered an important component for the definition of the signal-to-noise ratio, the reward corresponding to the current moment cannot be obtained immediately and this usually results in a delayed reward. The delayed reward may cause the policy of the reinforcement learning algorithm to update with an incompatible reward, which affects the stability and convergence of the algorithm. To overcome this challenge, we devise a technique of double Q-tables, where one Q-table is used to generate actions, and the other is used to correct deviations. In this way, the policy can be updated with a corresponding reward, which ameliorates the stability of the algorithm and accelerates its convergence speed. We show with two illustrative examples, the Fitzhugh–Nagumo and Hindmarsh–Rose models, stochastic resonance is significantly enhanced by the proposed method for two typical types of stochastic resonances, classical stochastic resonance with a weak signal and coherent resonance without weak signals, respectively. We also show the robustness of the proposed method.