As an effective anti-jamming approach, frequency-hopping (FH) technology has been widely applied to tactical communication system, providing reliable communication guarantee and improving resilience against conventional interference under strong confrontation environment. Key challenges for tactical wireless communication network face are the smart follower jammer with responsive spectrum reconnaissance and intelligent decision-making capabilities. In response, this article investigates a deep reinforcement learning based anti-jamming scheme, with the aim of maximizing the system throughput. The interactions between a radio transmitter and a smart follower jammer are formulated as a hierarchical anti-jamming dynamic game model, in which the radio terminal decides transmission power and hopping rate according to the state feedback information, and the jammer chooses spectrum scanning rate accordingly to minimizing the rewards of the FH communication system. We prove that there is a Nash equilibrium (NE) strategy for static and dynamic environment in the game. A double deep Q-network with prioritized experience reply (PDDQN) based anti-jamming scheme is proposed to approximating the optimal power control and hopping strategy without being aware of the environment and jamming parameters. Finally, simulation results demonstrate that the proposed algorithm efficiently provide better throughput and jamming resistance.
INTRODUCTIONDue to military communication network's heavy reliance on robust links and perceptivity to forms of hostile electronic attack, wireless communication system must be equipped with robust interference suppression capabilities in contested environments. As an effective anti-jamming method to improve link reliability and anti-jamming characteristics, broadband high-speed frequency-hopping (FH) technology has been applied extensively in the field of military tactical communication to weaken the effects of various types of intentional jamming. Variable hopping rate and power adaption are commonly used techniques to mitigate interference separately. 1,2 However, conventional method cannot adapt to the dynamic environment especially in the tactical command and control (C2) network with high mobility user node. Our target in this article is to investigate the effectiveness of a jointly optimized adaption approach which has flexible control over the signal strength and hopping rate.