Cognitive Radio (CR) provides a promising solution to the spectrum scarcity problem in the dense wireless networks, where the sensing ability of cognitive users helps acquire knowledge of the environment. However, cognitive users are vulnerable to different types of attacks, due to its shared medium. In particular, Jamming is considered as one of the most challenging security threat in CR networks. In jamming, an attacker jams the communication by transmitting a high power noise signal in the vicinity of the targeted node. The jammer could be an intelligent entity that is capable of exploiting the dynamics of the environment. In this work, we provide a machine-learning-based anti-jamming technique for CR networks to avoid a hostile jammer, where both the jamming and anti-jamming processes are formulated based on the Markov game framework. In our framework, secondary users avoid the jammer by maximizing its payoff function using an online, model-free reinforcement learning technique called Q-learning. We consider a realistic mathematical model, where the channel conditions are time varying and differ from one sub-channel to another, as in practical scenarios. Simulation results show that our proposed approach outperforms existing approaches to combat jamming over a wide range of scenarios.