“…In the field of machine learning (ML), reinforcement learning (RL) [11] is particularly influential for tasks like choosing optimal CW values. Key researchers in this field, i.e., Kim & Hwang [12], Zerguine et al [13], Kwon et al [14], Pan et al [15], Lee et al [16], and Zheng et al [17] have leveraged the Q-learning algorithm, as a core of the fundamental RL mechanism, to determine the appropriate CW value according to the consecutive successes or collisions transmitted packets. Nevertheless, Q-learning has the drawback which expensive for the agent, whereas in the earlier stages of the learning phase, each pair of states and actions must be exhaustively explored to converge towards the optimal policy.…”