2022
DOI: 10.1609/aaai.v36i10.21341
|View full text |Cite
|
Sign up to set email alerts
|

SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue Systems

Abstract: Zero/few-shot transfer to unseen services is a critical challenge in task-oriented dialogue research. The Schema-Guided Dialogue (SGD) dataset introduced a paradigm for enabling models to support any service in zero-shot through schemas, which describe service APIs to models in natural language. We explore the robustness of dialogue systems to linguistic variations in schemas by designing SGD-X - a benchmark extending SGD with semantically similar yet stylistically diverse variants for every schema. We observe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
24
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(25 citation statements)
references
References 32 publications
1
24
0
Order By: Relevance
“…Since SGSAcc uses schema information to construct candidate references, we also validated that SGSAcc is robust to different schema writing styles, as shown by the consistently high F1-score (>0.95) on distinguishing faithful and unfaithful utterances with the rephrased schema in the SGD-X extension (Lee et al, 2021) of SGD (See Table 2 in Appendix A).…”
Section: Sgsacc Evaluationmentioning
confidence: 61%
See 1 more Smart Citation
“…Since SGSAcc uses schema information to construct candidate references, we also validated that SGSAcc is robust to different schema writing styles, as shown by the consistently high F1-score (>0.95) on distinguishing faithful and unfaithful utterances with the rephrased schema in the SGD-X extension (Lee et al, 2021) of SGD (See Table 2 in Appendix A).…”
Section: Sgsacc Evaluationmentioning
confidence: 61%
“…Since SGSAcc uses the slot description in service schema to construct entailment reference, we check its robustness to different schema writing styles so that it can be used to evaluate a variety of services with heterogeneous interfaces. We use the SGD-X dataset (Lee et al, 2021), which contains five versions of schema rephrased from the original SGD to test whether SGSAcc is sensitive to writing styles.…”
Section: A Robustness Against Schema Writing Stylesmentioning
confidence: 99%
“…A task-specific training (e.g., reserving a table) is performed in the first phase. Task-specific training datasets are generally available for a wide range of tasks in many domains [25,62], whereas personalized counterparts are practically impossible to obtain. To overcome this challenge, we employ the unsupervised personalization phase.…”
Section: Task-specific Trainingmentioning
confidence: 99%
“…Large LMs are often sensitive to the choice of prompt (Zhao et al, 2021b;Reynolds and Mc-Donell, 2021). To this end, we evaluate SDT-seq on the SGD-X (Lee et al, 2021b) benchmark, comprising 5 variants with paraphrased slot names and descriptions for every schema (Appendix Figure 4). Note that SDT-seq only makes use of slot names, so variations in description have no effect on it.…”
Section: Robustnessmentioning
confidence: 99%
“…Also, descriptions only provide indirect supervision about how to interact with a service compared to an example. Furthermore, Lee et al (2021b) showed that schema-guided DST models are not robust to variations in schema descriptions, causing significant quality drops.…”
Section: Introductionmentioning
confidence: 99%