ALOJA: A systematic study of Hadoop deployment variables to enable automated characterization of cost-effectiveness

Poggi, Nicolás; Carrera, David; Call, Aaron; Mendoza, S. Montero; Becerra, Yolanda; Torres, Jordi; Ayguadé, Eduard; Gagliardi, Fabrizio; Labarta, Jesús; Reinauer, Rob; Vujic, Nikola; Green, Daron; Blakeley, José A.

doi:10.1109/bigdata.2014.7004322

Cited by 22 publications

(29 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Over time, collecting performance metrics have also become a Big Data problem: we have over 1.2TB of performance metrics for the executions after importing the executions into the database. A description of the architecture in ALOJA can be found in [11]. While the data is 45x smaller than traces from profiling as can be seen in Figure 2, as we get more executions, we currently use this data mostly for debugging executions manually.…”

Section: Benchmarkingmentioning

confidence: 99%

“…While the results from the data aggregation efforts allows to process data interactively for the analytic online tools [11], the increasing number of configuration choices as the project expands in architectures and services -in the millions for benchmarks that a single iteration can take hours to execute. In order to cope with the increasing number of configuration options the project was faced with the need first to do manual sampling from the search space, and grouping of results to extrapolate results between clusters.…”

Section: Predictive Analyticsmentioning

confidence: 99%

“…During the development of the defined phases for project ALOJA [11], we have experienced a shift from an initial approach of using low-level HPC tools to profile Hadoop runtime [3] based BSC's previous expertise to higher-level performance analysis. Part of the initial work included inserting hooks into the Hadoop source code to capture application events, that are later post-processed into the format of BSC's HPC tools, which are used to analyze the performance and parallel efficiency of supercomputing or MPI-based workloads.…”

Section: Aloja Evolutionmentioning

confidence: 99%

“…The ALOJA project is a research initiative from the Barcelona Supercomputing Center (BSC) with support from Microsoft Research and product groups [11] to explore and produce a systematic study of Hadoop configuration and deployment options. The study includes the main software configurations that can greatly impact Hadoop's performance [7]; as well as different hardware choices to evaluating their effectiveness and use cases including: current commodity hardware on which Hadoop systems where originally designed for [1], low-power devices, high-end servers, to new storage and networking 2 ; as well as new managed Cloud services (PaaS).…”

Section: Introductionmentioning

confidence: 99%

“…We use the Intel HiBench benchmark suite [19] as representative workloads, and we use the ALOJA-ML toolkit, the machine learning extension of the ALOJA framework [21] for such learning. The purpose of ALOJA-ML is to discover knowledge from Hadoop environments, first predicting execution times for given known workloads depending on the Hadoop configuration and the provided hardware resources; then evaluating which elements of a given deployment are the most relevant to reduce such running times.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

ALOJA: A Benchmarking and Predictive Platform for Big Data Performance Analysis

Poggi

Berral

Carrera

2016

Big Data Benchmarking

Self Cite

View full text Add to dashboard Cite

Abstract. The main goals of the ALOJA research project from BSC-MSR, are to explore and automate the characterization of cost-effectiveness of Big Data deployments. The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and webbased analytic tools to gather insights about system's cost-performance 1 . This article describes the evolution of the project's focus and research lines from over a year of continuously benchmarking Hadoop under different configuration and deployments options, presents results, and discusses the motivation both technical and market-based of such changes. During this time, ALOJA's target has evolved from a previous low-level profiling of Hadoop runtime, passing through extensive benchmarking and evaluation of a large body of results via aggregation, to currently leveraging Predictive Analytics (PA) techniques. Modeling benchmark executions allow us to estimate the results of new or untested configurations or hardware set-ups automatically, by learning techniques from past observations saving in benchmarking time and costs.

show abstract

Section: Benchmarkingmentioning

confidence: 99%

Section: Predictive Analyticsmentioning

confidence: 99%

Section: Aloja Evolutionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations