“…Engaging User Simulators and Assistant systems in synthetically generated dialogues leads to a natural reward function, and many works have used simulators in order to optimize a Reinforcement Learning policy (Schatzmann et al, 2007;Fazel-Zarandi et al, 2017;Peng et al, 2017;Su et al, 2018;Gür et al, 2018;Kreyssig et al, 2018). Such approaches are particularly used for optimizing the policy component of pipeline-based systems (Fazel-Zarandi et al, 2017), and frequently rely on the use of Natural Language Generation (NLG) templates over dialogue acts (Fazel-Zarandi et al, 2017;Shi et al, 2019;Kreyssig et al, 2018;Acharya et al, 2021). Our work instead utilizes fully lexicalized, E2E models for both the User and the Assistant models, without the need for agendas, dialogue acts, or NLG templates.…”