2012 IEEE International Symposium on Workload Characterization (IISWC) 2012
DOI: 10.1109/iiswc.2012.6402895
|View full text |Cite
|
Sign up to set email alerts
|

Workload characterization on a production Hadoop cluster: A case study on Taobao

Abstract: Abstract-MapReduce is becoming the state-of-the-art computing paradigm for processing large-scale datasets on a large cluster with tens or thousands of nodes. It has been widely used in various fields such as e-commerce, Web search, social networks, and scientific computation. Understanding the characteristics of MapReduce workloads is the key to achieving better configuration decisions and improving the system throughput. However, workload characterization of MapReduce, especially in a largescale production e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
33
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
4
4
2

Relationship

1
9

Authors

Journals

citations
Cited by 91 publications
(35 citation statements)
references
References 27 publications
2
33
0
Order By: Relevance
“…For example, we developed a statistical model based on one month of historical traces from Company ABC's production database workload. The workload distributions from Company ABC (reported further in the evaluation section) are similar to the distributions described in [40]. In particular, the task duration approximately follows a lognormal distribution, and the job arrival approximately follows a Poisson process.…”
Section: Workload Generationmentioning
confidence: 51%
“…For example, we developed a statistical model based on one month of historical traces from Company ABC's production database workload. The workload distributions from Company ABC (reported further in the evaluation section) are similar to the distributions described in [40]. In particular, the task duration approximately follows a lognormal distribution, and the job arrival approximately follows a Poisson process.…”
Section: Workload Generationmentioning
confidence: 51%
“…For the simulation workloads, we use traces from Yahoo! [28] and Taobao [29], while for the empirical workloads, we execute short-running jobs from the PUMA benchmark [30] to display the applicability of our framework in commonly used workloads from major companies like Twitter. Our experimental results illustrate that our search algorithm outperforms state-of-the-art schemes by a few orders of magnitude in terms of execution time.…”
Section: Contributionsmentioning
confidence: 99%
“…Hence, as the number of users increases, the role of the in-memory caching tier becomes increasingly important to meet the service level agreements/requirements (SLAs/SLRs) for the services under consideration. Similar to KV stores, applications at other tiers of the datacenter also require access to DRAM, although, after a certain point, the applications are not sensitive to DRAM [3,4,5]. Some of our previous work in this regard [3] characterized the most popular application for each server tier.…”
Section: Datacenter Applicationsmentioning
confidence: 99%