Nobody ever got fired for using Hadoop on a cluster

Rowstron, Antony; Narayanan, Dushyanth; Donnelly, Austin; O'Shea, Greg; Douglas, Andrew

doi:10.1145/2169090.2169092

Cited by 54 publications

(28 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, this is potentially costly as such systems usually have high implementation costs, as well as significant further costs for updating and analyzing data. Even though, we can reduce the problem to a few hundred GB that will fit into main memory, these companies still need capable staff in their IT department who can maintain and program complex in-memory multi-core infrastructures [22]. In particular, for small and medium enterprises (SME) the associated risks with such infrastructures are a strong barrier for innovation.…”

Section: Introductionmentioning

confidence: 99%

Pricing Approaches for Data Markets

Muschalle

Stahl

Löser

et al. 2013

Lecture Notes in Business Information Processing

View full text Add to dashboard Cite

Abstract. Currently, multiple data vendors utilize the cloud-computing paradigm for trading raw data, associated analytical services, and analytic results as a commodity good. We observe that these vendors often move the functionality of data warehouses to cloud-based platforms. On such platforms, vendors provide services for integrating and analyzing data from public and commercial data sources. We present insights from interviews with seven established vendors about their key challenges with regard to pricing strategies in different market situations and derive associated research problems for the business intelligence community.

show abstract

Section: Introductionmentioning

confidence: 99%

Pricing Approaches for Data Markets

Muschalle

Stahl

Löser

et al. 2013

Lecture Notes in Business Information Processing

View full text Add to dashboard Cite

show abstract

“…This is in line with reports in [8], [9], [10] that the majority of clusters are rather small and have fewer than 50 nodes. Also, we experimented with raw datasets at the orders of hundreds gigabytes (the corresponding RDDs are 1-5 times larger), since, this is the typical dataset processed even in companies that are notorious for their big data application demands [11]. The Spark version was 1.5.2.…”

Section: Our Setting and The Benchmarking Applicationsmentioning

confidence: 99%

Dynamic Configuration of Partitioning in Spark Applications

Gounaris

Kougka

Tous

et al. 2017

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Abstract-Spark has become one of the main options for large-scale analytics running on top of shared-nothing clusters. This work aims to make a deep dive into the parallelism configuration and shed light on the behavior of parallel spark jobs. It is motivated by the fact that running a Spark application on all the available processors does not necessarily imply lower running time, while may entail waste of resources. We first propose analytical models for expressing the running time as a function of the number of machines employed. We then take another step, namely to present novel algorithms for configuring dynamic partitioning with a view to minimizing resource consumption without sacrificing running time beyond a user-defined limit. The problem we target is NP-hard. To tackle it, we propose a greedy approach after introducing the notions of dependency graphs and of the benefit from modifying the degree of partitioning at a stage; complementarily, we investigate a randomized approach. Our polynomial solutions are capable of judiciously use the resources that are potentially at user's disposal and strike interesting trade-offs between running time and resource consumption. Their efficiency is thoroughly investigated through experiments based on real execution data.

show abstract

“…We believe 1 GB RAM per phone is enough to run most of the MapReduce style distributed jobs. Note here that the work in [35] reports that the median job input size for such jobs is less than 14 GB. One can easily partition such jobs across 15-20 phones and still schedule them using CWC.…”

Section: Design and Architecturementioning

confidence: 99%

Computing while Charging: Building a Distributed Computing Infrastructure using Smartphones

Bakiadarshani¹

2017

IJRASET

View full text Add to dashboard Cite

Every night, a large number of idle smartphones are plugged into a power source for recharging the battery. Given the increasing computing capabilities of smartphones, these idle phones constitute a sizeable computing infrastructure. Therefore, for an enterprise which supplies its employees with smartphones, we argue that a computing infrastructure that leverages idle smartphones being charged overnight is an energy-efficient and cost-effective alternative to running tasks on traditional server infrastructure. While parallel execution and scheduling models exist for servers (e.g., MapReduce), smartphones present a unique set of technical challenges due to the heterogeneity in CPU clock speed, variability in network bandwidth, and lower availability compared to servers.In this paper, we address many of these challenges to develop CWC-a distributed computing infrastructure using smartphones. Specifically, our contributions are: (i) we profile the charging behaviors of real phone owners to show the viability of our approach, (ii) we enable programmers to execute parallelizable tasks on smartphones with little effort, (iii) we develop a simple task migration model to resume interrupted task executions, and (iv) we implement and evaluate a prototype of CWC (with 18 Android smartphones) that employs an underlying novel scheduling algorithm to minimize the makespan of a set of tasks. Our extensive evaluations demonstrate that the performance of our approach makes our vision viable. Further, we explicitly evaluate the performance of CWC's scheduling component to demonstrate its efficacy compared to other possible approaches.

show abstract

Nobody ever got fired for using Hadoop on a cluster

Cited by 54 publications

References 1 publication

Pricing Approaches for Data Markets

Pricing Approaches for Data Markets

Dynamic Configuration of Partitioning in Spark Applications

Computing while Charging: Building a Distributed Computing Infrastructure using Smartphones

Contact Info

Product

Resources

About