“…AoI-optimal scheduling has attracted a significant amount of interest from the research community over the last few years [4]- [27]. Particularly, a popular approach is to model the problem as an MDP and find an optimal policy by using model-based reinforcement learning (RL) methods based on dynamic programming [4], [5], [9]- [12], [15], [16], [18]- [20], [26], [27], e.g., relative value iteration algorithm (RVIA), and/or model-free RL methods [4], [9], [10], [14], [21], [22], e.g., (deep) Q-learning.…”