CoDS: A Representative Sampling Method for Relational Databases

Buda, Teodora Sandra; Cerqueus, Thomas; Murphy, John; Kristiansen, Morten

doi:10.1007/978-3-642-40285-2_30

Cited by 3 publications

(9 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The sizes of the tables range from 77 (District) to 1,056,320 tuples (Trans). The Financial database schema is depicted in [4]. The starting table identified by ReX is the District table.…”

Section: Discussionmentioning

confidence: 99%

“…Both ReX and UpSizeR aim to scale the distributions of the relationships between tables by s (i.e., through primary and foreign keys). In [4] we proposed a sampling method that aimed to scale the same distributions by a sampling factor. We use the average representativeness error metric defined in [4], replacing the sampling rate with the scaling rate.…”

Section: Discussionmentioning

confidence: 99%

“…In [4] we proposed a sampling method that aimed to scale the same distributions by a sampling factor. We use the average representativeness error metric defined in [4], replacing the sampling rate with the scaling rate. Moreover, we use the global size error metric defined in [4] to evaluate the size of X related to O.…”

Section: Discussionmentioning

confidence: 99%

“…Thus, the method must decide for which partial number of tuples of t j it should create new tuples. As this represents a di↵erent problem by itself [4,8], in this paper we consider only natural scaling rates. Moreover, the scenario of naturally scaling databases is commonly applicable to enterprises where it is rarely needed to extrapolate to a fraction rather than a natural number.…”

Section: Rex: Extrapolation Systemmentioning

confidence: 99%

“…In this paper, we propose an automated representative extrapolation technique, ReX, that addresses the scaling problem above. Similarly to [4] and [19], we define a representative database as a database where the distributions of the relationships between the tables are preserved from the original database. As foreign keys are enforced links between tables, they represent invaluable inputs to depict the relationships between data in a relational database.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

ReX: Extrapolating Relational Data in a Representative Way

et al. 2015

Self Cite

View full text Add to dashboard Cite

Generating synthetic data is useful in multiple application areas (e.g., database testing, software testing). Nevertheless, existing synthetic data generators generally lack the necessary mechanism to produce realistic data, unless a complex set of inputs are given from the user, such as the characteristics of the desired data. An automated and e cient technique is needed for generating realistic data. In this paper, we propose ReX, a novel extrapolation system targeting relational databases that aims to produce a representative extrapolated database given an original one and a natural scaling rate. Furthermore, we evaluate our system in comparison with an existing realistic scaling method, UpSizeR, by measuring the representativeness of the extrapolated database to the original one, the accuracy for approximate query answering, the database size, and their performance. Results show that our solution significantly outperforms the compared method for all considered dimensions.

show abstract

“…The sizes of the tables range from 77 (District) to 1,056,320 tuples (Trans). The Financial database schema is depicted in [4]. The starting table identified by ReX is the District table.…”

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Section: Rex: Extrapolation Systemmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

ReX: Extrapolating Relational Data in a Representative Way

et al. 2015

Self Cite

View full text Add to dashboard Cite

show abstract

ReX: Representative extrapolating relational databases

Buda¹,

Cerqueus²,

Grava

et al. 2017

Information Systems

View full text Add to dashboard Cite

Forester: Approximate Processing of an Imperative Procedure for Query-Time Exploratory Data Analysis in a Relational Database

Rahman,

Lee

2024

Electronics

View full text Add to dashboard Cite

Query-time Exploratory Data Analysis (qEDA) is an increasingly demanding aspect of the data analysis process that entails visually and quantitatively summarizing, comprehending, and interpreting the primary characteristics of a dataset. Nowadays, an imperative procedure is popular in relational databases for EDA because it enables us to write multiple dependent declarative queries with imperative logic. As online analytical processing (OLAP) systems contain extremely large datasets, data scientists often need quick visualizations of data, using approximate processing of imperative procedures, before analyzing them in their entirety. We identify gaps in the existing techniques, in that they are unable to sample both declarative-dependent statements and control logic at the same time and perform multi-dependent sampling-based approximate processing within the permitted time in qEDA. Traditional approximate query processing (AQP) involves tuple sampling for a single query approximation and enables queries to be executed over arbitrary random samples of tables. However, available AQP methods cannot produce a further representative sample of the data distribution for the dependent statements to estimate accurately and quickly for multiple dependent statements. On the other hand, sampling control structures, like loops and conditional statements, are discussed separately, without regard to the imperative structure of statements in a procedure. In this study, we propose Forester, a novel agile approximate processing method for imperative procedures that performs imperative program-aware sampling, which includes both statements with control regions (i.e., branch and loop) and processes them approximately within the permitted time in qEDA. Our method produces more targeted samples for each relation, while maintaining the data and control flow of dependent queries and imperative logic and determining all the conditions for a relation across all the statements in the sample that guarantee the existence of relevant data for dependent data distribution. Utilizing a workload of multi-statement imperative procedures from the Transaction Processing Performance Council Decision Support (TPC-DS) database, our experiment demonstrates that Forester outperforms the existing system in sampling, producing minimum error, and improving response time.

show abstract

CoDS: A Representative Sampling Method for Relational Databases

Cited by 3 publications

References 19 publications

ReX: Extrapolating Relational Data in a Representative Way

ReX: Extrapolating Relational Data in a Representative Way

ReX: Representative extrapolating relational databases

Forester: Approximate Processing of an Imperative Procedure for Query-Time Exploratory Data Analysis in a Relational Database

Contact Info

Product

Resources

About