In an electronic warfare-type scenario with a partially known environment, the advantages of trial and error in the reinforcement learning algorithm adopted by a cognitive jammer can disrupt communications between transmitter-receiver pairs adaptively and optimally. However, complete information regarding the Markov decision process and its thousands of interactions seem to be unreliable, especially in applications that require timeliness, such as cognitive jamming.Faced with uncooperative or hostile transmitter-receiver pairs, it is a challenging control problem for the jammer to select the proper actions to realize communication denials. This paper addresses apprenticeship learning in cognitive jamming, where inverse reinforcement learning is used by the jammer to acquire skilled behaviors from expert demonstrations. Without a known prior reward function, the jammer can obtain high jamming performance under the guidance of the given expert strategy. More specifically, the number of iterations needed in apprenticeship learning is much less than that needed in Q learning, which is a vital advantage in a fast-changing environment. Numerous results demonstrate that it is feasible and realistic to use apprenticeship learning to learn the jamming strategy since its performance meets or exceeds those of existing methods.
KEYWORDScognitive jamming, inverse reinforcement learning, jamming strategy, Markov decision process, reinforcement learning Optim Control Appl Meth. 2019;40:647-658.wileyonlinelibrary.com/journal/oca