2022
DOI: 10.48550/arxiv.2204.00498
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Evaluating the Text-to-SQL Capabilities of Large Language Models

Abstract: We perform an empirical evaluation of Text-to-SQL capabilities of the Codex language model. We find that, without any finetuning, Codex is a strong baseline on the Spider benchmark; we also analyze the failure modes of Codex in this setting. Furthermore, we demonstrate on the GeoQuery and Scholar benchmarks that a small number of in-domain examples provided in the prompt enables Codex to perform better than state-of-the-art models finetuned on such few-shot examples.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
30
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(30 citation statements)
references
References 7 publications
(9 reference statements)
0
30
0
Order By: Relevance
“…More recently, large language models (LLMs) like GPT-3 [2] and Codex [3] have been shown to perform incredibly well in many NLP tasks without any training. [23] demonstrates Codex's near state-of-the-art performance on Spider in a zero-shot setting when prompted with in-context examples.…”
Section: Natural Language To Sql Translationmentioning
confidence: 92%
See 2 more Smart Citations
“…More recently, large language models (LLMs) like GPT-3 [2] and Codex [3] have been shown to perform incredibly well in many NLP tasks without any training. [23] demonstrates Codex's near state-of-the-art performance on Spider in a zero-shot setting when prompted with in-context examples.…”
Section: Natural Language To Sql Translationmentioning
confidence: 92%
“…Translation from natural language to SQL (Text-to-SQL) has been widely studied by the NLP community [21,23,24,27,31,34]. Difficulties in text-to-SQL are mainly two-fold: encoding a variety of complex relationships between the user's query and multiple tables, and decoding the SQL with valid representations.…”
Section: Natural Language To Sql Translationmentioning
confidence: 99%
See 1 more Smart Citation
“…Recent large pretrained models can perform the task without task-specific architectures (Scholak et al, 2021b) or even in a zero/few-shot manner (Shin et al, 2021;Brown et al, 2020;Chen et al, 2021a). Rajkumar et al (2022) evaluates Codex's text-to-SQL capability.…”
Section: Related Workmentioning
confidence: 99%
“…A seed semantic parser that is likely to generate a short list of candidates that contain the correct program. This requirement is not hard to satisfy in many applications, given that large language models achieve often achieve high top-k accuracy on generating simple Python snippets (Chen et al, 2021a), JSON data (Poesia et al, 2022), Lispress (Shin et al, 2021) and SQL programs (Scholak et al, 2021b;Rajkumar et al, 2022) with only a few training examples and are likely to continue improving (Kaplan et al, 2020). For example, we achieved 95% top-32 accuracy on SPIDER without any task-specific engineering beyond few-shot prompting (e.g., specialized architectures (Wang et al, 2020), decoding constraints (Scholak et al, 2021b), etc).…”
Section: Related Workmentioning
confidence: 99%