Exploring Unexplored Generalization Challenges for Cross-Database Semantic Parsing

Suhr, Alane; Chang, Ming‐Wei; Shaw, Peter; Lee, Kenton

doi:10.18653/v1/2020.acl-main.742

Cited by 69 publications

(100 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(2) On more realistic evaluation settings, including Spider-Realistic and the Suhr et al (2020) datasets, our method outperforms all baselines. This demonstrates the superiority of our pretraining framework in solving the text-table alignment challenge, and its usefulness in practice.…”

Section: Introductionmentioning

confidence: 96%

“…As pointed out by Suhr et al (2020), existing text-to-SQL benchmarks like Spider (Yu et al, 2018b) render the text-table alignment challenge easier than expected by explicitly mentioning exact column names in the NL utterances. Contrast this to more realistic settings where users may refer to the columns using a variety of expressions.…”

Section: Introductionmentioning

confidence: 99%

“…Contrast this to more realistic settings where users may refer to the columns using a variety of expressions. Suhr et al (2020) propose a new cross-database setting that uses Spider for training and includes eight other single-domain text-to-SQL datasets for evaluation. In addition to adopting their setting, we create a new evaluation set called Spider-Realistic from the original Spider dev set, by removing explicit mentions of column names from an utterance.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Structure-Grounded Pretraining for Text-to-SQL

Deng¹,

Awadallah²,

Meek³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Learning to capture text-table alignment is essential for tasks like text-to-SQL. A model needs to correctly recognize natural language references to columns and values and to ground them in the given database schema. In this paper, we present a novel weakly supervised Structure-Grounded pretraining framework (STRUG) for text-to-SQL that can effectively learn to capture text-table alignment based on a parallel text-table corpus. We identify a set of novel pretraining tasks: column grounding, value grounding and columnvalue mapping, and leverage them to pretrain a text-table encoder. Additionally, to evaluate different methods under more realistic text-table alignment settings, we create a new evaluation set Spider-Realistic based on Spider dev set with explicit mentions of column names removed, and adopt eight existing textto-SQL datasets for cross-database evaluation. STRUG brings significant improvement over BERT LARGE in all settings. Compared with existing pretraining methods such as GRAPPA, STRUG achieves similar performance on Spider, and outperforms all baselines on more realistic sets. All the code and data used in this work is public available at https://aka.ms/ strug.

show abstract

Section: Introductionmentioning

confidence: 96%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Structure-Grounded Pretraining for Text-to-SQL

Deng¹,

Awadallah²,

Meek³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…More recently, large-scale datasets consisting of hundreds of DBs and the corresponding question-SQL pairs have been released Zhong et al, 2017;Yu et al, 2019b,a) to encourage the development of semantic parsers that can work well across different DBs (Guo et al, 2019;Bogin et al, 2019b;Wang et al, 2019;Suhr et al, 2020;Choi et al, 2020). The setup is challenging as it requires the model to interpret a question conditioned on a relational DB unseen during training and accurately express the question intent via SQL logic.…”

Section: Introductionmentioning

confidence: 99%

Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing

Lin¹,

Socher²,

Xiong³

2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

We present BRIDGE, a powerful sequential architecture for modeling dependencies between natural language questions and relational databases in cross-DB semantic parsing. BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question. The hybrid sequence is encoded by BERT with minimal subsequent layers and the text-DB contextualization is realized via the fine-tuned deep attention in BERT. Combined with a pointergenerator decoder with schema-consistency driven search space pruning, BRIDGE attained state-of-the-art performance on the well-studied Spider benchmark (65.5% dev, 59.2% test), despite being much simpler than most recently proposed models for this task. Our analysis shows that BRIDGE effectively captures the desired cross-modal dependencies and has the potential to generalize to more text-DB related tasks. Our implementation is available at https://github.com/ salesforce/TabularSemanticParsing.

show abstract

“…However, it is still difficult for current state-of-the-art models to fill in the skeletons with semantically correct entities, especially when they are required to generalize to unseen DB schemas (Yu et al, 2018;Suhr et al, 2020). To predict the correct entity, the model should have a database (DB) schema grounded understanding of the NL question, which means that the model should be able to jointly learn the semantics in the NL question and the structured knowledge in a given database.…”

Section: Introductionmentioning

confidence: 99%

A Tale of Two Linkings: Dynamically Gating between Schema Linking and Structural Linking for Text-to-SQL Parsing

Chen

San²,

Xiao-dong

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

In Text-to-SQL semantic parsing, selecting the correct entities (tables and columns) for the generated SQL query is both crucial and challenging; the parser is required to connect the natural language (NL) question and the SQL query to the structured knowledge in the database. We formulate two linking processes to address this challenge: schema linking which links explicit NL mentions to the database and structural linking which links the entities in the output SQL with their structural relationships in the database schema. Intuitively, the effectiveness of these two linking processes changes based on the entity being generated, thus we propose to dynamically choose between them using a gating mechanism. Integrating the proposed method with two graph neural network-based semantic parsers together with BERT representations demonstrates substantial gains in parsing accuracy on the challenging Spider dataset. Analyses show that our proposed method helps to enhance the structure of the model output when generating complicated SQL queries and offers more explainable predictions.

show abstract

Exploring Unexplored Generalization Challenges for Cross-Database Semantic Parsing

Cited by 69 publications

References 49 publications

Structure-Grounded Pretraining for Text-to-SQL

Structure-Grounded Pretraining for Text-to-SQL

Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing

A Tale of Two Linkings: Dynamically Gating between Schema Linking and Structural Linking for Text-to-SQL Parsing

Contact Info

Product

Resources

About