Nowadays, companies are faced with the task of processing huge quantum of data. As the traditional database systems cannot handle this task in a cost-efficient manner, companies have built customized data processing frameworks. Cloud computing has emerged as a promising approach to rent a large IT infrastructure on a short-term pay-per-usage basis. This paper attempts to schedule tasks on compute nodes so that data sent from one node to the other has to traverse as few network switches as possible. The challenges and opportunities for efficient parallel data processing in cloud environments have been demonstrated and Nephele, the first data processing framework, has been presented to exploit the dynamic resource provisioning offered by the IaaS clouds. The overall utilisation of resources has been improved by assigning specific virtual machine types to specific tasks of a processing job and by automatically allocating or deallocating virtual machines in the course of a job execution. This has led to substantial reduction in the cost of parallel data processing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.