SQLMR : A Scalable Database Management System for Cloud Computing

Hsieh, Meng-Ju; Chang, Chao-Rui; Ho, Li-Yung; Wu, Jan‐Jan; Liu, Pangfeng

doi:10.1109/icpp.2011.54

Cited by 21 publications

(7 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The architecture of SLMR consists of: (a) a SQL-MapReduce compiler, which converts SQL statements to sequential MapReduce jobs, (b) a query result manager, which searches the log to find if similar query results are available in the cache, (c) a database partitioning and indexing manager, which is responsible for managing data files, partitioning the new data, and creating indexes, and (d) an optimized Hadoop, which is responsible for the generation of optimized MapReduce jobs. Hsieh et al [116] conducted several experiments to illustrate the scalability of data and system with respect to increasing data sizes. The approach suffers from the network load unbalancing because of the random placement of reducers, causing the reducers to become stragglers on a busy rack.…”

Section: Data Management Approaches and Systemsmentioning

confidence: 99%

“…Hsieh et al [116] proposed SQLMR, which is a data management system for the cloud. SQLMR combines SQL and MapReduce.…”

Section: Data Management Approaches and Systemsmentioning

confidence: 99%

“…Moreover, Ref. [116] provides a compiler to translate SQL programs to MapReduce. Furthermore, a technique to dynamically convert SQL files to HDFS that is to be accepted as an input to MapReduce runtime engine is proposed.…”

Section: Data Management Approaches and Systemsmentioning

confidence: 99%

See 2 more Smart Citations

Performance analysis of data intensive cloud systems based on data management and replication: a survey

Malik

Khan

Ewen

et al. 2015

Distrib Parallel Databases

View full text Add to dashboard Cite

As we delve deeper into the 'Digital Age', we witness an explosive growth in the volume, velocity, and variety of the data available on the Internet. For example, in 2012 about 2.5 quintillion bytes of data was created on a daily basis that originated from myriad of sources and applications including mobiledevices, sensors, individual 123 Distrib Parallel Databases archives, social networks, Internet of Things, enterprises, cameras, software logs, etc. Such 'Data Explosions' has led to one of the most challenging research issues of the current Information and Communication Technology era: how to optimally manage (e.g., store, replicated, filter, and the like) such large amount of data and identify new ways to analyze large amounts of data for unlocking information. It is clear that such large data streams cannot be managed by setting up on-premises enterprise database systems as it leads to a large up-front cost in buying and administering the hardware and software systems. Therefore, next generation data management systems must be deployed on cloud. The cloud computing paradigm provides scalable and elastic resources, such as data and services accessible over the Internet Every Cloud Service 123 Distrib Parallel Databases (data replication and management) to provide different QoS attributes is deliberated. Furthermore, the performance advantages and disadvantages of data replication and management approaches in the cloud computing environments are analyzed. Open issues and future challenges related to data consistency, scalability, load balancing, processing and placement are also reported.

show abstract

Section: Data Management Approaches and Systemsmentioning

confidence: 99%

“…Hsieh et al [116] proposed SQLMR, which is a data management system for the cloud. SQLMR combines SQL and MapReduce.…”

Section: Data Management Approaches and Systemsmentioning

confidence: 99%

See 1 more Smart Citation

Performance analysis of data intensive cloud systems based on data management and replication: a survey

Malik

Khan

Ewen

et al. 2015

Distrib Parallel Databases

View full text Add to dashboard Cite

show abstract

“…A lot of time is consumed in pre-partitioning phase. Hsieh et al (2011) Implemented one system model named "SQLMR", which is a hybrid approach to fill the gap between SQL-based and MapReduce data processing. With effective part partitioning and B tree indexing, low overhead file construction, optimized rack awareness algorithm, query result cache mechanism the system produced best results as compare to HadoopDB.…”

Section: Related Workmentioning

confidence: 99%

E-GENMR: Enhanced Generalized Query Processing using Double Hashing Technique through MapReduce in Cloud Database Management System

Malhotra¹,

Doja²,

Alam³

et al. 2017

Journal of Computer Science

View full text Add to dashboard Cite

Big Data, Cloud computing and Data Science is the booming future of IT industries. The common thing among all the new techniques is that they deal with not just Data but Big Data. Users store various kinds of data on cloud repositories. Cloud Database Management System deals with these large sets of data. Cloud Database service provider deals with many obstacles while providing various service. Amongst all the challenges processing of large amount of data, interoperability and security are the major concerns that are explained in this study. Enhanced Generalized Query Processing through MapReduce (E-GENMR) is a prototype model that provides solution for these problems. Firstly, traditional approaches are not suitable for processing such gigantic amount of data as they are not able to handle such amount of data. Various solutions have been developed such as Hadoop, MapReduce Programming codes, HIVE, PIG etc. but these technologies don't provide solution for these problems at the same time and moreover users are not compatible with these latest technologies like MapReduce codes. E-GENMR provides interoperability as it takes queries written in various RDBMS forms like SQL Server, ORACLE, DB2, MYSQL and convert into MapReduce codes as they are considered to be the efficient way for processing large data. Secondly, Client's data is stored in encrypted form and processing is done on this data hence it ensures the security aspect. Indexing plays a very important role in processing queries, in E-GENMR indexing is implemented using closed double hashing technique. We compared various query processing time of E-GENMR for encrypted data and unencrypted data. A comparison of various queries has been done to evaluate the performance of E-GENMR with latest techniques like Hadoopdb, SQLMR, HIVE and PIG and it has been concluded that E-GENMR shows better performance.

show abstract

“…These techniques can achieve automatic scalability by changing environments and loads. In [3], a hybrid solution, called SQLMR was proposed. The SQLMR combines the programming advantage of SQL with the fault tolerant, heterogeneous cluster, scalable capabilities of MapReduce.…”

Section: Introductionmentioning

confidence: 99%

A design of scalable service platform for sensor network applications

Naruephiphat

Promya

Charnsripinyo

2014

2014 11th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technolo

View full text Add to dashboard Cite

This paper presents a design of scalable service platform for sensor network applications, which have to deal with large volume of data. It is challenging to design and develop a scalable system for managing a massive amount of data to be stored, processed, and transferred. In our proposed system, we integrate advantages of several techniques, which are reactor pattern, MVC model and Blueprint CSS framework. In our performance test based on reactor pattern, the simulation results show that 100 percent of successful rate can be achieved at high load. The implementation of web application based on MVC model provides modifiability and extensibility in our service platform. Furthermore, the web application development base on Blueprint CSS framework can support various web browsers. Based on our experiments, the service platform can be practically applied for other new services and applications.

show abstract

SQLMR : A Scalable Database Management System for Cloud Computing

Cited by 21 publications

References 8 publications

Performance analysis of data intensive cloud systems based on data management and replication: a survey

Performance analysis of data intensive cloud systems based on data management and replication: a survey

E-GENMR: Enhanced Generalized Query Processing using Double Hashing Technique through MapReduce in Cloud Database Management System

A design of scalable service platform for sensor network applications

Contact Info

Product

Resources

About