Reinforcement learning (RL) has the advantage of interaction with an environment over time, which is helpful in cognitive jamming research, especially in an electronic warfare-type scenario, in which the communication parameters and jamming effect are unknown to a jammer. In this paper, an algorithm for a jamming strategy using orthogonal matching pursuit (OMP) and multi-armed bandit (MAB) is proposed. We construct a dictionary in which each atom represents a symbol error rate (SER) curve and can be obtained with known noise distribution and deterministic parameters. By reconnoitering, the jammer counts acknowledge/not acknowledge (ACK/NACK) frames to calculate the SER, which is also regarded as samples that are sampled from the real SER curve using an MAB. When we obtain the sampled sequence and the constructed dictionary, the OMP algorithm is used to search and locate atoms and its corresponding coefficients. With the searching results, the jammer can construct an SER curve that is similar to the real SER curve. The experimental results demonstrate that the proposed algorithm can learn an optimal jamming strategy with three interactions, which converges substantially faster than the state of the art.