Joint relay and channel selection in relay‐aided anti‐jamming system: A reinforcement learning approach

As an effective anti-jamming approach, frequency-hopping (FH) technology has been widely applied to tactical communication system, providing reliable communication guarantee and improving resilience against conventional interference under strong confrontation environment. Key challenges for tactical wireless communication network face are the smart follower jammer with responsive spectrum reconnaissance and intelligent decision-making capabilities. In response, this article investigates a deep reinforcement learning based anti-jamming scheme, with the aim of maximizing the system throughput. The interactions between a radio transmitter and a smart follower jammer are formulated as a hierarchical anti-jamming dynamic game model, in which the radio terminal decides transmission power and hopping rate according to the state feedback information, and the jammer chooses spectrum scanning rate accordingly to minimizing the rewards of the FH communication system. We prove that there is a Nash equilibrium (NE) strategy for static and dynamic environment in the game. A double deep Q-network with prioritized experience reply (PDDQN) based anti-jamming scheme is proposed to approximating the optimal power control and hopping strategy without being aware of the environment and jamming parameters. Finally, simulation results demonstrate that the proposed algorithm efficiently provide better throughput and jamming resistance. INTRODUCTIONDue to military communication network's heavy reliance on robust links and perceptivity to forms of hostile electronic attack, wireless communication system must be equipped with robust interference suppression capabilities in contested environments. As an effective anti-jamming method to improve link reliability and anti-jamming characteristics, broadband high-speed frequency-hopping (FH) technology has been applied extensively in the field of military tactical communication to weaken the effects of various types of intentional jamming. Variable hopping rate and power adaption are commonly used techniques to mitigate interference separately. 1,2 However, conventional method cannot adapt to the dynamic environment especially in the tactical command and control (C2) network with high mobility user node. Our target in this article is to investigate the effectiveness of a jointly optimized adaption approach which has flexible control over the signal strength and hopping rate.

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Joint power and hopping rate adaption against follower jammer based on deep reinforcement learning

Wang

Zhang

2022

“…Furthermore, 25 has presented a deep learning‐based model for spectrum sensing in the CR domain for capturing the temporal correlation features from spectrum data. A reinforcement learning‐based approach has been used Reference 26 to investigate a joint relay and channel selection problem in a multirelay antijamming communication system. The authors Reference 27 introduced a novel approach to design precoder weights for multiple objectives using DNN.…”

Section: Introductionmentioning

confidence: 99%

Data‐driven approach to design energy‐efficient joint precoders at source and relay using deep learning in MIMO‐CRNs

Sahu

Maurya

Bansal

et al. 2022

This article studies the problem of designing energy-efficient joint precoder at source and relay for multiple-input multiple-output cognitive relay networks (MIMO-CRNs). Existing optimization methods typically suffer from high computational complexity of finding the optimal solution for such nonconvex fractional programming problems. In contrast to prior works, this article considers a data-driven approach to design the joint precoders using a deep neural network (DNN). The proposed DNN learns the optimal precoder weights on a set of different channel matrices during the offline training phase and allows the computational cost reduction in the online deployment phase. The numerical results demonstrate that this approach provides a comparable performance at significantly lower computational complexity in comparison with the conventional optimization-based algorithm. Furthermore, it is shown that the proposed approach is quite robust against the variations in the channel statistics, which makes it suitable for real-time implementation.

“…In Reference 12, a RL‐based algorithm is used to increase throughput and solve the problem of collisions between primary and secondary users in a spectrum sharing environment. In Reference 13, the throughput of a multirelay system with jamming is increased using the RL algorithm.…”

Section: Introductionmentioning

confidence: 99%

Multipower‐level Q‐learning algorithm for random access in nonorthogonal multiple access massive machine‐type communications systems

Silva

Abrão

2022

The massive machine-type communications (mMTC) service will be part of new services planned to integrate the beyond fifth generation of wireless communication. In mMTC, thousands of devices sporadically access available resource blocks on the network. In this scenario, the massive random access problem arises when two or more devices collide when selecting the same resource block.There are several techniques to deal with this problem. One of them deploys Q-learning (QL), in which devices store in their Q-table the rewards sent by the central node that indicate the quality of the transmission performed. The device learns which are the best resource blocks to select and transmit in order to avoid collisions. We propose a multipower-level QL (MPL-QL) algorithm that uses nonorthogonal multiple access (NOMA) transmit scheme to generate transmission power diversity and allow accommodate more than one device in the same time-slot as long as the signal-to-interference-plus-noise ratio (SINR) exceeds a threshold value. The numerical results reveal that the best performance-complexity trade-off is obtained by using a higher number of power levels, typically eight levels. The proposed MPL-QL can deliver better throughput and lower latency when compared to other recent QL-based algorithms found in the literature. INTRODUCTIONMachine-type wireless communication will be more widely used in applications such as internet of things (IoT), smart house, virtual reality, etc. 1,2 The goal of the fifth generation (5G) of wireless communications involves achieve ubiquitous communication in networks with ultra-dense devices allocation. [3][4][5] A data consumption of nearly 5 zettabytes per month is estimated across 17 billion devices. 6 In addition, due to the outbreak of the COVID-19 pandemic, there has been an remarkable increase in remote activities in work, health and education areas, which will be much more frequent in the post-pandemic environment. 7 Devices connected to the wireless network use different types of service. In the 5G of wireless communications systems, a clear division into three main use modes was defined: 8 enhanced mobile broadband (eMBB) for devices that require high data rates as an augmented reality user; ultra-reliable low-latency communications (URLLC) for applications that require 99.999% communication reliability such as remote surgery, while holding end-to-end latency below 1