“…We evaluate both end-to-end and policy optimization settings. This includes UBAR (Nekvinda and Dusek, 2021), PPTOD (Su et 2022), RSTOD (Cholakov and Kolev, 2022), BORT (Sun et al, 2022a), MTTOD (Lee, 2021), HDNO (Wang et al, 2020a), GALAXY , MarCO (Wang et al, 2020b), Mars (Sun et al, 2022b), and KRLS . To obtain database search results in the end-to-end setting, we use MTTOD's dialogue state tracker, which is trained jointly during fine-tuning.…”