Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing 2012
DOI: 10.1145/2169090.2169092
|View full text |Cite
|
Sign up to set email alerts
|

Nobody ever got fired for using Hadoop on a cluster

Abstract: The norm for data analytics is now to run them on commodity clusters with MapReduce-like abstractions. One only needs to read the popular blogs to see the evidence of this. We believe that we could now say that "nobody ever got fired for using Hadoop on a cluster"! We completely agree that Hadoop on a cluster is the right solution for jobs where the input data is multi-terabyte or larger. However, in this position paper we ask if this is the right path for general purpose data analytics? Evidence suggests that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
27
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 54 publications
(28 citation statements)
references
References 1 publication
1
27
0
Order By: Relevance
“…However, this is potentially costly as such systems usually have high implementation costs, as well as significant further costs for updating and analyzing data. Even though, we can reduce the problem to a few hundred GB that will fit into main memory, these companies still need capable staff in their IT department who can maintain and program complex in-memory multi-core infrastructures [22]. In particular, for small and medium enterprises (SME) the associated risks with such infrastructures are a strong barrier for innovation.…”
Section: Introductionmentioning
confidence: 99%
“…However, this is potentially costly as such systems usually have high implementation costs, as well as significant further costs for updating and analyzing data. Even though, we can reduce the problem to a few hundred GB that will fit into main memory, these companies still need capable staff in their IT department who can maintain and program complex in-memory multi-core infrastructures [22]. In particular, for small and medium enterprises (SME) the associated risks with such infrastructures are a strong barrier for innovation.…”
Section: Introductionmentioning
confidence: 99%
“…This is in line with reports in [8], [9], [10] that the majority of clusters are rather small and have fewer than 50 nodes. Also, we experimented with raw datasets at the orders of hundreds gigabytes (the corresponding RDDs are 1-5 times larger), since, this is the typical dataset processed even in companies that are notorious for their big data application demands [11]. The Spark version was 1.5.2.…”
Section: Our Setting and The Benchmarking Applicationsmentioning
confidence: 99%
“…We believe 1 GB RAM per phone is enough to run most of the MapReduce style distributed jobs. Note here that the work in [35] reports that the median job input size for such jobs is less than 14 GB. One can easily partition such jobs across 15-20 phones and still schedule them using CWC.…”
Section: Design and Architecturementioning
confidence: 99%