This paper investigates the joint relay and channel selection problem using a deep reinforcement learning (DRL) algorithm for cooperative communications in a dynamic jamming environment. The latest types of jammers include the mobile and smart jammer that contains multiple jamming patterns. This new type of jammer poses serious challenges to reliable communications such as huge environment states, tightly coupled joint action selections and real-time decision requirements. To cope with these challenges, a DRLbased relay-assisted cooperative communication scheme is proposed. In this scheme, the joint selection problem is constructed as a Markov decision process (MDP) and a double deep Q network (DDQN) based anti-jamming scheme is proposed to address the unknown and dynamic jamming behaviors. Concretely, a joint decision-making network composed of three sub-networks is designed and the independent learning method of each sub-network is proposed. The simulation results show that the user agent is able to anticipate the jammer behaviors and elude the jamming in advance. Furthermore, compared with the sensing-based algorithm, the Q learning-based algorithm and the existing DRL-based anti-jamming approaches, the proposed algorithm maintains a higher average normalized throughput.
INTRODUCTIONRelay-assisted cooperative communication technology has been considered as a direct technology that increases the system transmission rate and expands the communication coverage [1]. However, cooperative communications are vulnerable to malicious jamming signals, which calls for efficient anti-jamming technologies [2][3][4]. Several techniques such as power control, frequency hopping, backscatter and beamforming have been proposed to combat the malicious jamming [5][6][7][8].Recently, the authors [9, 10] summarize the jammers' behaviors in each domain and name them the jamming patterns. The policy of frequency decisions of the jammer can be regarded as the jamming pattern in frequency domain. As shown in Figure 1(a), in the comb pattern, the jammer transmits jamming signals at fixed frequency points [11]. Figure 1(b) illustrates the sweeping pattern where the jammer hops jamming signals at aThis is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.