Pulse jamming is one of the common malicious jamming patterns that can significantly reduce the of wireless communication's reliability. This paper investigates the problem of anti-jamming communication in a random pulse jamming environment. In order to obtain the countermeasure in time domain, the Markov decision process (MDP) is employed to model and analyze the above problem, and a time-domain anti-pulse jamming algorithm (TDAA) based on reinforcement learning is proposed. The proposed algorithm learns from the dynamic interaction with the jamming environment to gradually approximate the optimal time-domain strategy. The optimal strategy enables the transmitter to switch between two states, i.e. ''active'' and ''silent'', to avoid random pulse jamming. In addition, a state estimation and adjustment method for the random pulse jamming environment is introduced to improve the robustness of the proposed TDAA. Simulation results show that, compared with continuous transmission, the proposed TDAA can effectively reduce the jamming collision ratio and significantly improve the normalized throughput. And compared with transmitting terminal Q-learning algorithm (TTQA), the proposed TDAA has higher time utilization ratio and normalized throughput.INDEX TERMS Anti-jamming, Markov decision process, Q-learning, reinforcement learning, random pulse jamming.