Adopting multiband transmission in optical networks can cost-effectively increase network capacity without deploying new fibre. In this paper, we focus on the solutions explored by the research community to address the problem of resource allocation in dynamic multiband elastic optical networks. We start by summarising the main challenges and contributions of the design of ad-hoc heuristics. Next, we review the few recent approaches based on deep reinforcement learning and evaluate the efficacy of different techniques to improve their performance. We also discuss possible future directions for research in the area.