Nowadays, users are required to cache, scrutinise, and process massive datasets from various fields, including science, business, and research. As a result, they require data-intensive platforms with ample storage and processing power. In addition, many of these kinds of platforms must-have features like parallel processing, fault tolerance, data dissemination, scalability, availability, and load balancing. Google developed the MapReduce programming paradigm to counter this problem, which served as the foundation for Apache's open-source Hadoop project. *Author for correspondence Hadoop relies upon a particular file system designated as HDFS, analogous to Google's File-System (GFS). It splits the massive data into equally sized segments and then places them across multiple nodes in a Hadoop cluster [1]. As a result, Hadoop is now widely accepted as a data analytics model [2]. Hadoop's fundamental operating principle is that "moving computation to data is less expensive than moving data to computation." As a result, Hadoop tries to schedule tasks on local data nodes to minimise network traffic [3]. Task scheduling is critical in Hadoop because it significantly impacts the framework's computation time and, thus, its overall performance [4]. However, given the dynamic nature of the cloud environment, proposing an effective task scheduling strategy is a constant challenge. Nevertheless, only a few studies have analyzed the proposed techniques and their overall effect on the Hadoop framework's