“…Recent developments in deep learning have led to end-to-end approaches to dialogue using supervised learning, such as sequence-to-sequence models (Dušek and Jurcicek, 2016;Eric and Manning, 2017), hierarchical models (Serban et al, 2017), attention (Mei et al, 2017;Chen et al, 2019), andTransformer-based models (Wu et al, 2021;Hosseini-Asl et al, 2020;Peng et al, 2020;Adiwardana et al, 2020). However, supervised learning only allows an agent to imitate behaviors, requires optimal data, and does not allow agents to exceed human performance.…”