iHOME: Index-Based JOIN Query Optimization for Limited Big Data Storage

Sahal, Radhya; Nihad, Marwah; Khafagy, Mohamed Helmy; Omara, Fatma A.

doi:10.1007/s10723-018-9431-9

Cited by 16 publications

(6 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This protocol shows that SQL systems are not practical for single-patient queries since response times are slower. The query optimization problem was addressed in [22], where the authors addressed the issue of the efficient execution of JOIN queries in the Hadoop query language, Hive, over limited big data storages.…”

Section: Related Workmentioning

confidence: 99%

Performance Impact of Optimization Methods on MySQL Document-Based and Relational Databases

et al. 2021

View full text Add to dashboard Cite

Databases are an important part of today’s applications where large amounts of data need to be stored, processed, and accessed quickly. One of the important criteria when choosing to use a database technology is its data processing performance. In this paper, some methods for optimizing the database structure and queries were applied on two popular open-source database management systems: MySQL as a relational DBMS, and document-based MySQL as a non-relational DBMS. The main objective of this paper was to conduct a comparative analysis of the impact that the proposed optimization methods have on each specific DBMS when carrying out CRUD (CREATE, READ, UPDATE, DELETE) requests. To perform the analysis and performance evaluation of CRUD operations for different amounts of data, a case study testing architecture based on Java was developed and used to show how the databases’ proposed optimization methods can influence the performance of the application, and to highlight the differences in response time and complexity. The results obtained show the degree to which the proposed optimization methods contributed to the application’s performance improvement in the case of both databases; based on these, a detailed analysis and several conclusions are presented to support a decision for choosing a specific approach.

show abstract

Section: Related Workmentioning

confidence: 99%

Performance Impact of Optimization Methods on MySQL Document-Based and Relational Databases

et al. 2021

View full text Add to dashboard Cite

show abstract

“…In particular, the BlockJoin algorithm is based on two concepts, index-join and late materialization, which are known in the context of parallel dataflow engines. In addition, an index-based system for reusing data called indexing HiveQL Optimization for join over Multisession Big Data Environment (iHOME) was presented [26]. e proposed iHOME system addresses eight cases of join queries which are classified into three groups: Similar-to-iHOME, Compute-on-iHOME, and Filter-of-iHOME.…”

Section: Join Optimization Different Mapreduce Join Strategiesmentioning

confidence: 99%

“…Substantially, the shuffle step is considered expensive since it needs to sort and join all tuples. erefore, the shuffling operations need to be optimized to improve the join performance and reduce the total intermediate data size of join query [26]. However, exploiting sharing opportunities including loading, sorting, and joining data among multiple join queries is a challenging task.…”

Section: Introductionmentioning

confidence: 99%

Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink

et al. 2020

Self Cite

View full text Add to dashboard Cite

Multiway join queries incur high-cost I/Os operations over large-scale data. Exploiting sharing join opportunities among multiple multiway joins could be beneficial to reduce query execution time and shuffled intermediate data. Although multiway join optimization has been carried out in MapReduce, different design principles (i.e., in-memory Big Data platforms, Flink) are not considered. To bridge the gap of not considering the optimization of Big Data platforms, an end-to-end multiway join over Flink, which is called Join-MOTH system (J-MOTH), is proposed to exploit sharing data granularity, sharing join granularity, and sharing implicit sorts within multiple join queries. For sharing data, our previous work, Multiquery Optimization using Tuple Size and Histogram (MOTH) system, has been introduced to consider the granularity of sharing data opportunities among multiple queries. For sharing sort, our previous work, Sort-Based Optimizer for Big Data Multiquery (SOOM), has been introduced to consider the implicit sorts among join queries. For sharing join, additional modules have been tailored to the J-MOTH optimizer to optimize sharing work by exploiting shared pipelined multiway join among multiple multiway join queries. The experimental evaluation has demonstrated that the J-MOTH system outperforms the naive and the state-of-the-art techniques by 44% for query execution time using TPC-H queries. Also, the proposed J-MOTH system introduces maximal intermediate data size reduction by 30% in average over Hadoop-like infrastructures.

show abstract

“…All algorithms are implemented and tested in Hadoop (HDFS) and HBase, utilizing different queries on tables of various sizes and different score-attribute distributions. The query optimization problem is addressed in [20], where authors deal with the efficiently execution of JOIN queries on top of Hadoop query language, Hive, over limited Big Data storages. A novel data integration methodology to query data individually from different relational and NoSQL database systems is proposed in [21].…”

Section: Background and Related Workmentioning

confidence: 99%

A Study on Join Operations in MongoDB Preserving Collections Data Models for Future Internet Applications

2019

View full text Add to dashboard Cite

Presently, we are observing an explosion of data that need to be stored and processed over the Internet, and characterized by large volume, velocity and variety. For this reason, software developers have begun to look at NoSQL solutions for data storage. However, operations that are trivial in traditional Relational DataBase Management Systems (DBMSs) can become very complex in NoSQL DBMSs. This is the case of the join operation to establish a connection between two or more DB structures, whose construct is not explicitly available in many NoSQL databases. As a consequence, the data model has to be changed or a set of operations have to be performed to address particular queries on data. Thus, open questions are: how do NoSQL solutions work when they have to perform join operations on data that are not natively supported? What is the quality of NoSQL solutions in such cases? In this paper, we deal with such issues specifically considering one of the major NoSQL document oriented DB available on the market: MongoDB. In particular, we discuss an approach to perform join operations at application layer in MongoDB that allows us to preserve data models. We analyse performance of the proposes approach discussing the introduced overhead in comparison with SQL-like DBs.

show abstract

iHOME: Index-Based JOIN Query Optimization for Limited Big Data Storage

Cited by 16 publications

References 32 publications

Performance Impact of Optimization Methods on MySQL Document-Based and Relational Databases

Performance Impact of Optimization Methods on MySQL Document-Based and Relational Databases

Exploiting Sharing Join Opportunities in Big Data Multiquery Optimization with Flink

A Study on Join Operations in MongoDB Preserving Collections Data Models for Future Internet Applications

Contact Info

Product

Resources

About