Several emerging applications in wireless communications are required to achieve low latency, but also high traffic rates and reliabilities. From a latency point of view, most of the state-of-the-art techniques consider the average latency which may not directly apply to scenarios with stringent latency constraints. In this paper, we consider scheduling under a max-delay constraint; this is an NP-hard problem. We propose a novel approach to tackle the scheduling problem by directly addressing the constraint. We consider the downlink of a multi-cell wireless communication network with nodes communicating with users each facing their own delay constraint on randomly arrived packets. Packets must be scheduled to meet the users' delay constraints. Our main contributions are first, proposing a new search approach, Super State Monte-Carlo Tree Search (SS-MCTS), as a version of regular MCTS modified for large-scale probabilistic environments; second, developing trained value and policy networks to reduce computational complexity, and finally, addressing the scheduling problem through a reinforcement learning framework. Our numerical results demonstrate that the proposed approach significantly improves the packet delivery rate over a baseline approach while meeting the max-delay constraint, and addressing the scalability as the main issues in large action-state spaces.
INDEX TERMSMonte-Carlo Tree Search, Scheduling, Reinforcement Learning, Max-Delay Constraints • First and foremost, proposing the Super State Monte-Carlo Tree Search (SS-MCTS) method as a modified version of regular MCTS to account for delay-sensitive VOLUME 4, 2016