The family of mapreduce and large-scale data processing systems

Sakr, Sherif; Liu, Anna; Fayoumi, Ayman G.

doi:10.1145/2522968.2522979

Cited by 142 publications

(44 citation statements)

References 114 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, such frameworks are not efficient for directly implementing iterative graph algorithms which often require multiple stages of complex joins [37]. In addition, the general-purpose join and aggregation mechanisms defined in such distributed frameworks are not designed to leverage the common patterns and structure in iterative graph algorithms.…”

Section: Hadoop-based Systemsmentioning

confidence: 99%

“…The popular MapReduce framework [10] and its open source realization, Hadoop, 2 together with its associated ecosystem (e.g., Pig, 3 Hive 4 ) represent the pervasive technology for big data processing [37]. In principle, the MapReduce framework provides a simple but powerful programming model that enables developers to easily build scalable parallel algorithms to process massive amounts of data on clusters of commodity machines.…”

Section: Introductionmentioning

confidence: 99%

“…In principle, the MapReduce framework provides a simple but powerful programming model that enables developers to easily build scalable parallel algorithms to process massive amounts of data on clusters of commodity machines. However, the MapReduce programming model has its own limitations [37]. For example, it does not provide any direct support for iterative data analysis (or equivalently, recursive) tasks.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Large scale graph processing systems: survey and an experimental evaluation

et al. 2015

Self Cite

View full text Add to dashboard Cite

Graph is a fundamental data structure that captures relationships between different data entities. In practice, graphs are widely used for modeling complicated data in different application domains such as social networks, protein networks, transportation networks, bibliographical networks, knowledge bases and many more. Currently, graphs with millions and billions of nodes and edges have become very common. In principle, graph analytics is an important big data discovery technique. Therefore, with the increasing abundance of large graphs, designing scalable systems for processing and analyzing large scale graphs has become one of the most timely problems facing the big data research community. In general, scalable processing of big graphs is B Sherif Sakr a challenging task due to their size and the inherent irregular structure of graph computations. Thus, in recent years, we have witnessed an unprecedented interest in building big graph processing systems that attempted to tackle these challenges. In this article, we provide a comprehensive survey over the state-of-the-art of large scale graph processing platforms. In addition, we present an extensive experimental study of five popular systems in this domain, namely, GraphChi, Apache Giraph, GPS, GraphLab and GraphX. In particular, we report and analyze the performance characteristics of these systems using five common graph processing algorithms and seven large graph datasets. Finally, we identify a set of the current open research challenges and discuss some promising directions for future research in the domain of large scale graph processing.

show abstract

Section: Hadoop-based Systemsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Large scale graph processing systems: survey and an experimental evaluation

et al. 2015

Self Cite

View full text Add to dashboard Cite

show abstract

“…Zhao et al (2014) provided a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework. Sakr, Liu, and Fayoumi (2013) surveyed the MapReduce framework's variants and its extensions for large scale data processing.…”

Section: Big Data and Big Data Analyticsmentioning

confidence: 99%

AHP Model for the Big Data Analytics Platform Selection

Lněnička

2015

AIP

View full text Add to dashboard Cite

Big data analytics refers to a set of advanced technologies, which are designed to efficiently operate and maintain data that are not only big, but also high in variety and velocity. This paper analyses these emerging big data technologies and presents a comparison of the selected big data analytics platforms through the whole data life. The main aim is then to propose and demonstrate the use of an AHP model for the big data analytics platform selection, which may be used by businesses, public sector institutions as well as citizens to solve multiple criteria decision-making problems. It would help them to discover patterns, relationships and useful information in their big data, make sense of them and to take responsive action.

show abstract

“…To make our presentation self-contained, we briefly review here these concepts and the associated notations. The interested reader may find in [23] a survey of NoSQL systems, while [32,73] cover MapReduce variations and other massively parallel data management frameworks.…”

Section: Building Blocks: Key-value Stores and Mapreduce Systemsmentioning

confidence: 99%

RDF in the clouds: a survey

2014

View full text Add to dashboard Cite

The Resource Description Framework (RDF) pioneered by the W3C is increasingly being adopted to model data in a variety of scenarios, in particular data to be published or exchanged on the Web. Managing large volumes of RDF data is challenging, due to the sheer size, the heterogeneity, and the further complexity brought by RDF reasoning. To tackle the size challenge, distributed storage architectures are required. Cloud computing is an emerging paradigm massively adopted in many applications for the scalability, faulttolerance, and elasticity feature it provides, enabling the easy deployment of distributed and parallel architectures. In this article, we survey RDF data management architectures and systems designed for a cloud environment, and more generally, those large-scale RDF data management systems that can be easily deployed therein. We first give the necessary background, then describe the existing systems and proposals in this area, and classify them according to dimensions related to their capabilities and implementation techniques. The survey ends with a discussion of open problems and perspectives.

show abstract

The family of mapreduce and large-scale data processing systems

Cited by 142 publications

References 114 publications

Large scale graph processing systems: survey and an experimental evaluation

Large scale graph processing systems: survey and an experimental evaluation

AHP Model for the Big Data Analytics Platform Selection

RDF in the clouds: a survey

Contact Info

Product

Resources

About