Robust Dialogue State Tracking with Weak Supervision and Sparse Data

Heck, Michael; Lubis, Nurul; Niekerk, Carel van; Feng, Shutong; Geishauser, Christian; Lin, Hsin-Ying; Gašić, Milica

doi:10.1162/tacl_a_00513

Cited by 7 publications

(9 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As shown in Figure 1, experimental results on the MultiWOZ benchmark (Budzianowski et al, 2018) represent a significant milestone. Our approach is the first that, without further fine-tuning, enables modestly sized open-source LLMs (7B or 13B parameters) to achieve comparable or superior performance compared to previous state-of-the-art (SOTA) prompting methods that relied exclusively on advanced proprietary LLMs such as ChatGPT and Codex (Hudeček and Dušek, 2023;Heck et al, 2023;. Furthermore, our approach beats the previous zero-shot SOTA by 5.6% Av.…”

Section: Zero-shot Dst Paradigmsmentioning

confidence: 79%

“…( 2) Previous prompting approaches that have only shown efficacy with advanced ChatGPT and Codex. These include IC-DST (Hu et al, 2022) using Codex, (Heck et al, 2023) and InstructTODS using ChatGPT (GPT-3.5/4).…”

Section: Methodsmentioning

confidence: 99%

“…Previous prompting approaches (Heck et al, 2023; Peng et al, 2020;Su et al, 2021) necessitate training on curated domain-specific annotated data, a process that is notoriously costly and laborintensive. Despite efforts in automated dataset creation using GPT-3 (Li et al, 2022), these methods struggle to generalize to unseen domains.…”

Section: Zero-shot Dst Paradigmsmentioning

confidence: 99%

“…LLMs exhibit remarkable capabilities for tackling various tasks without the need for task-specific fine-tuning, making them suited for zero-shot DST. However, while there have been initiatives to leverage ChatGPT for zero-shot DST (Hu et al, 2022;Hudeček and Dušek, 2023;Heck et al, 2023;, these methods tend to treat DST as a standalone task rather than chat completion, which the models, especially chat-tuned models, are more proficient in. They usually take the whole conversation as input along with detailed instructions to generate in domain-specific formats.…”

Section: Zero-shot Dst Paradigmsmentioning

confidence: 99%

“…TOD differs from general conversation in that it requires models to not only generate responses but also track dialogue states according to domain-specific schemas. While ChatGPT has shown effectiveness in response generation within TOD (Li et al, 2023c), their performance of zero-shot DST, as explored in recent research on prompting approaches (Hu et al, 2022;Bang et al, 2023;Hudeček and Dušek, 2023;Heck et al, 2023;, are still not satisfying, which remains a significant challenge.…”

Section: Leveraging Llms For Dialogue Tasksmentioning

confidence: 99%

See 4 more Smart Citations

Introduction: sustainable Logistics Systems using AI-based Meta-Heuristics Approaches

2023

Sustainable Logistics Systems Using AI-based Meta-Heuristics Approaches

View full text Add to dashboard Cite

Large language models (LLMs) are increasingly prevalent in conversational systems due to their advanced understanding and generative capabilities in general contexts. However, their effectiveness in task-oriented dialogues (TOD), which requires not only response generation but also effective dialogue state tracking (DST) within specific tasks and domains, remains less satisfying. In this work, we propose a novel approach FNCTOD for solving DST with LLMs through function calling. This method improves zero-shot DST, allowing adaptation to diverse domains without extensive data collection or model tuning. Our experimental results demonstrate that our approach achieves exceptional performance with both modestly sized open-source and also proprietary LLMs: with in-context prompting it enables various 7B or 13B parameter models to surpass the previous state-of-the-art (SOTA) achieved by ChatGPT, and improves ChatGPT's performance beating the SOTA by 5.6% Avg. JGA. Individual model results for GPT-3.5 and GPT-4 are boosted by 4.8% and 14%, respectively. We also show that by fine-tuning on a small collection of diverse task-oriented dialogues, we can equip modestly sized models, specifically a 13B parameter LLaMA2-Chat model, with functioncalling capabilities and DST performance comparable to ChatGPT while maintaining their chat capabilities. We will open-source experimental code and model.

show abstract

Section: Zero-shot Dst Paradigmsmentioning

confidence: 79%

Section: Methodsmentioning

confidence: 99%