2014 IEEE International Conference on Big Data (Big Data) 2014
DOI: 10.1109/bigdata.2014.7004322
|View full text |Cite
|
Sign up to set email alerts
|

ALOJA: A systematic study of Hadoop deployment variables to enable automated characterization of cost-effectiveness

Abstract: Abstract-This article presents the ALOJA project, an initiative to produce mechanisms for an automated characterization of cost-effectiveness of Hadoop deployments and reports its initial results. ALOJA is the latest phase of a long-term collaborative engagement between BSC and Microsoft which, over the past 6 years has explored a range of different aspects of computing systems, software technologies and performance profiling. While during the last 5 years, Hadoop has become the de-facto platform for Big Data … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2016
2016
2019
2019

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 22 publications
(29 citation statements)
references
References 5 publications
0
29
0
Order By: Relevance
“…Over time, collecting performance metrics have also become a Big Data problem: we have over 1.2TB of performance metrics for the executions after importing the executions into the database. A description of the architecture in ALOJA can be found in [11]. While the data is 45x smaller than traces from profiling as can be seen in Figure 2, as we get more executions, we currently use this data mostly for debugging executions manually.…”
Section: Benchmarkingmentioning
confidence: 99%
See 4 more Smart Citations
“…Over time, collecting performance metrics have also become a Big Data problem: we have over 1.2TB of performance metrics for the executions after importing the executions into the database. A description of the architecture in ALOJA can be found in [11]. While the data is 45x smaller than traces from profiling as can be seen in Figure 2, as we get more executions, we currently use this data mostly for debugging executions manually.…”
Section: Benchmarkingmentioning
confidence: 99%
“…While the results from the data aggregation efforts allows to process data interactively for the analytic online tools [11], the increasing number of configuration choices as the project expands in architectures and services -in the millions for benchmarks that a single iteration can take hours to execute. In order to cope with the increasing number of configuration options the project was faced with the need first to do manual sampling from the search space, and grouping of results to extrapolate results between clusters.…”
Section: Predictive Analyticsmentioning
confidence: 99%
See 3 more Smart Citations