Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins

Suri, Sahaana; Ilyas, Ihab F.; Ré, Christopher; Rekatsinas, Theodoros

doi:10.48550/arxiv.2106.01501

Cited by 3 publications

(5 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Techniques for data integration [9], [10], [11], [36], [39], [40], [41], [42] generally aim to automatically discover, select and aggregate related data in order to extend a given dataset. Many of the approaches deal with tabular data.…”

Section: Data Integrationmentioning

confidence: 99%

“…Neural style transfer [6], generative modeling techniques such as variational autoencoders (VAEs) [7] and generative adversarial networks (GANs) [8] have also been extensively used to generate synthetic data for training deep learning models. Another way to augment training data is by integrating existing data from several sources (e.g., in [9], [10], [11]). This is a useful way to leverage the large quantities of data available in various forms on the internet and other sources.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Data augmentation: A comprehensive survey of modern approaches

Mumuni

2022

Array

244

View full text Add to dashboard Cite

Section: Data Integrationmentioning

confidence: 99%

mentioning

confidence: 99%

Data augmentation: A comprehensive survey of modern approaches

Mumuni

2022

Array

244

View full text Add to dashboard Cite

“…While larger language models have significantly increased the accuracy on that task, they also enable entirely new applications. Here, the tutorial will cover recent research leveraging language models for tasks such as data preparation and integration [2,74,75], fact checking from data [10, 25, 33-40, 81, 82], or database tuning [78][79][80][85][86][87].…”

Section: Applications In Data Managementmentioning

confidence: 99%

“…Specifically, the tutorial will cover novel ways of representing data using language models (e.g., by storing data as natural language facts [77] or by integrating data within the language model [26]). Also, it will discuss the use of language models in the execution engine (e.g., to implement operators [74,77] or to synthesize code for data processing [84]).…”

Section: Applications In Data Managementmentioning

confidence: 99%

From BERT to GPT-3 codex

Trummer

2022

Proc. VLDB Endow.

View full text Add to dashboard Cite

Large language models have recently advanced the state of the art on many natural language processing benchmarks. The newest generation of models can be applied to a variety of tasks with little to no specialized training. This technology creates various opportunities for applications in the context of data management. The tutorial will introduce participants to basic background on language models, discuss different methods to use language models, and give an overview and short demonstration of available libraries and APIs. Models for generating natural language will be considered as well as models, such as GPT-3 Codex, which complete program code or generate code from natural language instructions. Finally, the tutorial will discuss recent research in the database community that exploits language models in the context of traditional database systems or proposes novel system architectures that are based on them. The tutorial is targeted at database researchers. No prior background on language models is required. The goal of the tutorial is to introduce database researchers to the latest generation of language models, and to their use cases in the domain of data management.

show abstract

“…CodexDB relates to prior work exploiting machine learning [6,7,13] and specifically Transformers [20,21] in the context of database systems. It connects broadly to prior work using GPT-3 for program synthesis [5,11,12].…”

Section: Background and Related Workmentioning

confidence: 99%

CodexDB: Generating Code for Processing SQL Queries using GPT-3 Codex

Trummer¹

2022

Preprint

View full text Add to dashboard Cite

CodexDB is an SQL processing engine whose internals can be customized via natural language instructions. CodexDB is based on OpenAI's GPT-3 Codex model which translates text into code. It is a framework on top of GPT-3 Codex that decomposes complex SQL queries into a series of simple processing steps, described in natural language. Processing steps are enriched with user-provided instructions and descriptions of database properties. Codex translates the resulting text into query processing code. An early prototype of CodexDB is able to generate correct code for a majority of queries of the WikiSQL benchmark and can be customized in various ways.

show abstract

Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins

Cited by 3 publications

References 36 publications

Data augmentation: A comprehensive survey of modern approaches

Data augmentation: A comprehensive survey of modern approaches

From BERT to GPT-3 codex

CodexDB: Generating Code for Processing SQL Queries using GPT-3 Codex

Contact Info

Product

Resources

About