In Cloud based Big Data applications, Hadoop has been widely adopted for distributed processing large scale data sets. However, the wastage of energy consumption of data centers still constitutes an important axis of research due to overuse of resources and extra overhead costs. As a solution to overcome this challenge, a dynamic scaling of resources in Hadoop YARN Cluster is a practical solution. This paper proposes a dynamic scaling approach in Hadoop YARN (DSHYARN) to add or remove nodes automatically based on workload. It is based on two algorithms (scaling up/down) which are implemented to automate the scaling process in the cluster. This article aims to assure energy efficiency and performance of Hadoop YARN’ clusters. To validate the effectiveness of DSHYARN, a case study with sentiment analysis on tweets about covid-19 vaccine is provided. the goal is to analyze tweets of the people posted on Twitter application. The results showed improvement in CPU utilization, RAM utilization and Job Completion time. In addition, the energy has been reduced of 16% under average workload.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.