The massive machine-type communications (mMTC) service will be part of new services planned to integrate the beyond fifth generation of wireless communication. In mMTC, thousands of devices sporadically access available resource blocks on the network. In this scenario, the massive random access problem arises when two or more devices collide when selecting the same resource block.There are several techniques to deal with this problem. One of them deploys Q-learning (QL), in which devices store in their Q-table the rewards sent by the central node that indicate the quality of the transmission performed. The device learns which are the best resource blocks to select and transmit in order to avoid collisions. We propose a multipower-level QL (MPL-QL) algorithm that uses nonorthogonal multiple access (NOMA) transmit scheme to generate transmission power diversity and allow accommodate more than one device in the same time-slot as long as the signal-to-interference-plus-noise ratio (SINR) exceeds a threshold value. The numerical results reveal that the best performance-complexity trade-off is obtained by using a higher number of power levels, typically eight levels. The proposed MPL-QL can deliver better throughput and lower latency when compared to other recent QL-based algorithms found in the literature.
INTRODUCTIONMachine-type wireless communication will be more widely used in applications such as internet of things (IoT), smart house, virtual reality, etc. 1,2 The goal of the fifth generation (5G) of wireless communications involves achieve ubiquitous communication in networks with ultra-dense devices allocation. [3][4][5] A data consumption of nearly 5 zettabytes per month is estimated across 17 billion devices. 6 In addition, due to the outbreak of the COVID-19 pandemic, there has been an remarkable increase in remote activities in work, health and education areas, which will be much more frequent in the post-pandemic environment. 7 Devices connected to the wireless network use different types of service. In the 5G of wireless communications systems, a clear division into three main use modes was defined: 8 enhanced mobile broadband (eMBB) for devices that require high data rates as an augmented reality user; ultra-reliable low-latency communications (URLLC) for applications that require 99.999% communication reliability such as remote surgery, while holding end-to-end latency below 1