This paper proposes a method to facilitate the identification process for a set of configuration parameters to achieve the optimal performance with respect to a benchmark program in HDFS in an automated manner. Performance optimization of Hadoop processes is a tedious yet challenging problem due to the complexity of the systems organization with an extensive list of configuration parameters to be considered. An Automated Benchmarking Configuration Method (ABCM) is developed in this work to facilitate the identification process for the set of configuration parameters that minimizes the execution time of a benchmark, namely TestDFSIO Write and Read in particular. A two-phased configuration parameters selection process with a simple sampling technique is proposed in order to mediate the exponential computation time otherwise. By using the proposed technique, we have automatically found the sets of top five selected optimal configuration parameters that reduced the average execution time by 32% compared to the execution time with the default set of Hadoop configuration parameters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.