“…But current DRL-based dialogue policy approaches mostly remain brute-force random sampling training, improving their performance at the expense of high interaction costs (Jiang et al, 2015;Ren et al, 2018;Narvekar and Stone, 2019;Narvekar et al, 2020). Inspired by human education, a novel training paradigm, curriculum learning (CL), is proposed to improve learning performance and efficiency through training a model on a designed sequence of training tasks, rather than an arbitrary random sampling (Svetlik et al, 2017;Fan et al, 2018;Racanière et al, 2019;Green et al, 2019). Although many empirical studies demonstrated beneficial effects of CL, reporting in the field of dialogue policy remains very limited (Zhao et al, 2021a;Liu et al, 2021).…”