Sequence-to-sequence (seq2seq) models have achieved great success in semantic parsing tasks, but they tend to struggle on out-of-distribution (OOD) data. Despite recent progress, robust semantic parsing on large-scale tasks that combine challenges from both compositional generalization and natural language variations remains an unsolved issue. To encourage research in this area, this work introduces CUDON, a large-scale dialogue dataset in the Chinese language, specifically created to evaluate the compositional generalization of semantic parsing. The dataset contains about ten thousand multi-turn complex queries, and provides multiple splits with different degrees of train-test distribution divergence. We have investigated improving compositional generalization through grammar-based decoding on this dataset. With specially designed grammars that leverage program schema, we are able to significantly improve the accuracy of seq2seq semantic parsers on OOD splits: a LSTM-based parser using a Context-free Grammar (CFG) achieves over 25% higher accuracy than a standard seq2seq baseline; a parser using Tree-Substitution Grammar (TSG) improves parsing speed by five to seven times over the CFG parser with only a small accuracy loss. The grammar-based LSTM parsers also outperforms BART-and T5-based seq2seq parsers on the OOD splits, despite having less than one tenth of the parameters and no pretraining. We also validated our approach on the SMCalflow-CS dataset, specifically on the zero-shot learning task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.