“…These systems are the core modules of virtual assistants (e.g., Apple Siri and Amazon Alexa), and they provide natural language interfaces for online services [1]. Recently, there has been growing interest in developing deep learning-based end-to-end ToD systems [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] because they can handle complex dialogue patterns with minimal hand-crafted rules. To advance the existing state-of-the-art, large-scale datasets [17,1,16] have been proposed for training and evaluating such data-driven systems.…”