Abstract-More and more Internet companies rely on large scale data analysis as part of their core services for tasks such as log analysis, feature extraction or data filtering. Map-Reduce, through its Hadoop implementation, has proved to be an efficient model for dealing with such data. One important challenge when performing such analysis is to predict the performance of individual jobs. In this paper, we propose a simple framework to predict the performance of Hadoop jobs. It is composed of a dynamic light-weight Hadoop job analyzer, and a prediction module using locally weighted regression methods. Our framework makes some theoretical cost models more practical, and also well fits for the diversification of the jobs and clusters. It can also help those users who want to predict the cost when applying for an ondemand cloud service. At the end, we do some experiments to verify our framework.
The energy efficiency of cloud computing has recently attracted a great deal of attention. As a result of raised expectations, cloud providers such as Amazon and Microsoft have started to deploy a new IaaS service, a MapReduce-style virtual cluster, to process data-intensive workloads. Considering that the IaaS provider supports multiple pricing options, we study batch-oriented consolidation and online 123 F. Teng et al. placement for reserved virtual machines (VMs) and on-demand VMs, respectively. For batch cases, we propose a DVFS-based heuristic TRP-FS to consolidate virtual clusters on physical servers to save energy while guarantee job SLAs. We prove the most efficient frequency that minimizes the energy consumption, and the upper bound of energy saving through DVFS techniques. More interestingly, this frequency only depends on the type of processor. FS can also be used in combination with other consolidation algorithms. For online cases, a time-balancing heuristic OTB is designed for on-demand placement, which can reduce the mode switching by means of balancing server duration and utilization. The experimental results both in simulation and using the Hadoop testbed show that our approach achieves greater energy savings than existing algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.