Benchmarking approach for designing a mapreduce performance model

Zhang, Zhuoyao; Cherkasova, Ludmila; Loo, Boon Thau

doi:10.1145/2479871.2479906

Cited by 41 publications

(24 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Counters help profile the job performance and provide important information for designing new schedulers. We utilize the extended set of counters from [6] in DyScale.…”

Section: Mapreduce Backgroundmentioning

confidence: 99%

DyScale: A MapReduce Job Scheduler for Heterogeneous Multicore Processors

Yan

Cherkasova

Zhang

et al. 2017

IEEE Trans. Cloud Comput.

Self Cite

View full text Add to dashboard Cite

The functionality of modern multi-core processors is often driven by a given power budget that requires designers to evaluate different decision trade-offs, e.g., to choose between many slow, power-efficient cores, or fewer faster, power-hungry cores, or a combination of them. Here, we prototype and evaluate a new Hadoop scheduler, called DyScale, that exploits capabilities offered by heterogeneous cores within a single multi-core processor for achieving a variety of performance objectives. A typical MapReduce workload contains jobs with different performance goals: large, batch jobs that are throughput oriented, and smaller interactive jobs that are response time sensitive. Heterogeneous multi-core processors enable creating virtual resource pools based on "slow" and "fast" cores for multi-class priority scheduling. Since the same data can be accessed with either "slow" or "fast" slots, spare resources (slots) can be shared between different resource pools. Using measurements on an actual experimental setting and via simulation, we argue in favor of heterogeneous multi-core processors as they achieve "faster" (up to 40%) processing of small, interactive MapReduce jobs, while offering improved throughput (up to 40%) for large, batch jobs. We evaluate the performance benefits of DyScale versus the FIFO and Capacity job schedulers that are broadly used in the Hadoop community.

show abstract

“…Counters help profile the job performance and provide important information for designing new schedulers. We utilize the extended set of counters from [6] in DyScale.…”

Section: Mapreduce Backgroundmentioning

confidence: 99%

DyScale: A MapReduce Job Scheduler for Heterogeneous Multicore Processors

Yan

Cherkasova

Zhang

et al. 2017

IEEE Trans. Cloud Comput.

Self Cite

View full text Add to dashboard Cite

show abstract

“…According to Zhang et al [3], "MapReduce and Hadoop represent an economically compelling alternative for efficient large scale data processing and costeffective analytics over 'Big Data' in the enterprise". Details on these two distributed data processing components will be discussed in the next subsections.…”

Section: Data Storage and Processingmentioning

confidence: 99%

Performance analysis of MapReduce on OpenStack-based hadoop virtual cluster

Ahmad

Yaacob

Amin

et al. 2014

2014 IEEE 2nd International Symposium on Telecommunication Technologies (ISTT)

View full text Add to dashboard Cite

With the emergence of big data phenomenon, MapReduce and Hadoop distributed processing infrastructure have been commonly applied for large-scale data analytics. Hadoop distributed filesystem (HDFS) usually being deployed on physical clusters. With the advent of cloud computing platform such as OpenStack, a number of works have been carried out in implementing Hadoop virtual cluster on cloud computing infrastructure. This paper presents a performance analysis of MapReduce implementations on OpenStack-based Hadoop virtual cluster. The results of the analysis show that the MapReduce implementations are performing in a scalable manner towards an increase in the size of the Hadoop virtual cluster being deployed.

show abstract

“…These workloads continuously evolve as the user base changes, as features are activated or disabled and as user feature preferences change. Such varying field workloads often lead to load tests that are not reflective of the field [9,46], yet these workloads have a major impact on the performance of the system [15,49].…”

Section: Introductionmentioning

confidence: 99%

“…Performance analysts must determine the cause of any deviation in the counter values from the specified or expected range (e.g., response time exceeds the maximum response time permitted by the service level agreements or memory usage exceeds the average historical memory usage). These deviations may be caused by changes to the field workloads [15,49]. Such changes are common and may require performance analysts to update their load test cases [9,46].…”

Section: Introductionmentioning

confidence: 99%

Continuous validation of load test suites

Syer

Jiang

Nagappan

et al. 2014

Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering

View full text Add to dashboard Cite

Ultra-Large-Scale (ULS) systems face continuously evolving field workloads in terms of activated/disabled feature sets, varying usage patterns and changing deployment configurations. These evolving workloads often have a large impact on the performance of a ULS system. Hence, continuous load testing is critical to ensuring the error-free operation of such systems. A common challenge facing performance analysts is to validate if a load test closely resembles the current field workloads. Such validation may be performed by comparing execution logs from the load test and the field. However, the size and unstructured nature of execution logs makes such a comparison unfeasible without automated support. In this paper, we propose an automated approach to validate whether a load test resembles the field workload and, if not, determines how they differ by compare execution logs from a load test and the field. Performance analysts can then update their load test cases to eliminate such differences, hence creating more realistic load test cases. We perform three case studies on two large systems: one open-source system and one enterprise system. Our approach identifies differences between load tests and the field with a precision of ≥75% compared to only ≥16% for the state-of-the-practice.

show abstract

Benchmarking approach for designing a mapreduce performance model

Cited by 41 publications

References 7 publications

DyScale: A MapReduce Job Scheduler for Heterogeneous Multicore Processors

DyScale: A MapReduce Job Scheduler for Heterogeneous Multicore Processors

Performance analysis of MapReduce on OpenStack-based hadoop virtual cluster

Continuous validation of load test suites

Contact Info

Product

Resources

About