Logical design of multi-model data warehouses

Bimonte, Sandro; Gallinucci, Enrico; Marcel, Patrick; Rizzi, Stefano

doi:10.1007/s10115-022-01788-0

Cited by 12 publications

(3 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the context of multi-model data warehousing and and to improve the ETL performance, various options arise for representing dimensions and facts logically. Sandro Bimonte et al [20], propose guidelines for multimodel multidimensional design in MMDBMS-based data warehouses, validated through intra-model and intermodel comparisons, and illustrated with a case study. In [21], the authors consider the variety of data sources and the variety of data warehouse storage platforms and their impact on the efficiency and performance of the ETL process.…”

Section: Related Workmentioning

confidence: 99%

Towards a Scalable and Efficient ETL

Gueddoudj,

Chikh

2023

IJCDS

View full text Add to dashboard Cite

Extract, transform, and load (ETL) processes are crucial for building repositories of data from a variety of self-contained sources. Despite their complexity and cost, ETL processes have demonstrated some maturity for traditional, XML, and graph data sources. However the main challenge for ETL processes is double: (1) they do not scale when brought down to managing large and highly varied data sources, involving web-data. (2) the deployment of the target data warehouse in a polystore. The paper reviews various research efforts along this line of research. The paper then proposes a conceptual modeling of these processes using BPMN (Business Process Modeling Notation). These processes are automatically converted to scripts to be implemented within Spark framework. The solution is packaged according a new distributed architecture (Open ETL) that supports both batch and stream processing. To make our new approach more concrete and evaluable, a real case study using the LUBM benchmark, which involves heterogeneous data sources is considered.

show abstract

Section: Related Workmentioning

confidence: 99%

Towards a Scalable and Efficient ETL

Gueddoudj,

Chikh

2023

IJCDS

View full text Add to dashboard Cite

show abstract

“…According to Stonebraker and Çetintemel [42], a polystore system is a database system having various heterogeneous data stores and diverse query interfaces. Due to this heterogeneity of the storage systems in the current Big Data ecosystem, many dissimilar backends or federated storage engines are required [2].…”

Section: Etl Deployment Over a Polystorementioning

confidence: 99%

Os-ETL: A High-Efficiency, Open-Scala Solution for Integrating Heterogeneous Data in Large-Scale Data Warehousing

Gueddoudj,

Chikh,

Attia

2023

ISI

View full text Add to dashboard Cite

The surge in data volume necessitates the integration of Resource Description Framework (RDF) data within corporate environments. While Extract, Transform, Load (ETL) processes exhibit proficiency with conventional data sources, their scalability diminishes when applied to large and highly varied data sources, inclusive of RDF data. The latter constitutes a wealth of knowledge that, when harnessed via data warehouse technology, can augment corporate value in a fiercely competitive milieu. The advent of platforms like polystore offers opportunities for advanced hardware deployment. ETL processes necessitate two crucial phases: Partitioning and data allocation. Concurrently, the scientific community is spurred to innovate ETL processes that support real-time analytics. This study proposes a novel architecture for ETL processes, named Open-Scala-ETL (Os-ETL). Equipped with a method for deploying a data warehouse based on a polystore, Os-ETL enables real-time analysis. The primary objective of the Os-ETL solution is to resolve the complexities in deploying a graph structure data warehouse on a polystore-a process that involves partitioning and data allocation. Os-ETL is a distributed solution that supports both batch and streaming processing using the Spark framework. Scala scripts are executed within this framework to partition RDF graphs and distribute the resultant fragments across various sites. The implementation of Os-ETL is based on Apache Spark, with ETL deployment on a Spark SQL polystore. This solution empowers companies with data warehouse technology to improve performance, scalability, and latency between a data warehouse and its data sources. The approach has been assessed and validated using largescale, heterogeneous data, encompassing the LUBM benchmark, CSV files, an Oracle database, and a Neo4j graph database. The results corroborate its superior performance in terms of scalability and optimization.

show abstract

“…Te emergence of multimodel database [3] (MMDB) provides a new solution to efectively address the shortcomings of traditional database. As a new trend in the feld of database management systems [4][5][6], multimodel database can store data of various structural forms in a single engine, without the need to deploy diferent databases for data of various structures. Multimodel database is also considered to be the next generation of data management system combining fexibility, scalability, and consistency [7,8].…”

Section: Introductionmentioning

confidence: 99%

Workload-Aware Performance Tuning for Multimodel Databases Based on Deep Reinforcement Learning

Sun,

Ye,

Nedjah

et al. 2023

International Journal of Intelligent Systems

View full text Add to dashboard Cite

Currently, multimodel databases are widely used in modern applications, but the default configuration often fails to achieve the best performance. How to efficiently manage and tune the performance of multimodel databases is still a problem. Therefore, in this study, we present a configuration parameter tuning tool MMDTune+ for ArangoDB. First, the selection of configuration parameters is based on the random forest algorithm for feature selection. Second, a workload-aware mechanism is based on k-means++ and the Pearson correlation coefficient to detect workload changes and match the empirical knowledge of historically similar workloads. Finally, the ArangoDB configuration parameters are optimized based on the improved TD3 algorithm. The experimental results show that MMDTune+ can recommend higher-quality configuration parameters for ArangoDB compared to OtterTune and CDBTune in different scenarios.

show abstract

Logical design of multi-model data warehouses

Cited by 12 publications

References 36 publications

Towards a Scalable and Efficient ETL

Towards a Scalable and Efficient ETL

Os-ETL: A High-Efficiency, Open-Scala Solution for Integrating Heterogeneous Data in Large-Scale Data Warehousing

Workload-Aware Performance Tuning for Multimodel Databases Based on Deep Reinforcement Learning

Contact Info

Product

Resources

About