Anais Do XXXV Simpósio Brasileiro De Banco De Dados (SBBD 2020) 2020
DOI: 10.5753/sbbd.2020.13637
|View full text |Cite
|
Sign up to set email alerts
|

A Process for Inference of Columnar NoSQL Database Schemas

Abstract: Although NoSQL Databases do not require a schema a priori, to be aware of the database schema is essential for activities like data integration, data validation or data interoperability. This paper presents a process for inference of columnar NoSQL DB schemas. We validate the proposed process through a prototype tool that is able to extract schemas from the HBase columnar NoSQL database system. HBase was chosen as a case study because it is one of the most popular columnar NoSQL solutions. When compared to rel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 6 publications
0
1
0
Order By: Relevance
“…This argument could be extended further: Even basic data preprocessing and delivery of first insights is still tedious with tabular data in conjunction with today's tooling. Data engineering libraries such as Pandas and Spark provide only minimal schema inference capabilities on a columnar level, and NoSQL databases such as HBase require even manual inference [4]. Tools like SDV are highly capable, but also dependency-heavy and focused on model training rather than fast detailed schema inference.…”
Section: Motivationmentioning
confidence: 99%
“…This argument could be extended further: Even basic data preprocessing and delivery of first insights is still tedious with tabular data in conjunction with today's tooling. Data engineering libraries such as Pandas and Spark provide only minimal schema inference capabilities on a columnar level, and NoSQL databases such as HBase require even manual inference [4]. Tools like SDV are highly capable, but also dependency-heavy and focused on model training rather than fast detailed schema inference.…”
Section: Motivationmentioning
confidence: 99%