Search engines are nowadays widely applied to store and analyze logs generated by largescale distributed systems. To adapt to various workload scenarios, log search engines such as Elasticsearch usually expose a large number of performance-related configuration parameters. As manual configuring is time consuming and labor intensive, automatically tuning configuration parameters to optimize performance has been an urgent need. However, it is challenging because: 1) Due to the complex implementation, the relationship between performance and configuration parameters is difficult to model and thus the objective function is actually a black box; 2) In addition to application parameters, JVM and kernel parameters are also closely related to the performance and together they construct a high dimensional configuration space; 3) To iteratively search for the best configuration, a tool is necessary to automatically deploy the newly generated configuration and launch tests to measure the corresponding performance. To address these challenges, this paper designs and implements HDConfigor, an automatic holistic configuration parameter tuning tool for log search engines. In order to solve the high dimensional optimization problem, we propose a modified Random EMbedding Bayesian Optimization algorithm (mREMBO) in HDConfigor which is a black-box approach. Instead of directly using a black-box optimization algorithm such as Bayesian optimization (BO), mREMBO first generates a lower dimensional embedded space through introducing a random embedding matrix and then performs BO in this embedded space. Therefore, HDConfigor is able to find a competitive configuration automatically and quickly. We evaluate HDConfigor in an Elasticsearch cluster with different workload scenarios. Experimental results show that compared with the default configuration, the best relative median indexing results achieved by mREMBO can reach 2.07×. In addition, under the same number of trials, mREMBO is able to find a configuration with at least a further 10.31% improvement in throughput compared to Random search, Simulated Annealing and BO. INDEX TERMS Log search engine, configuration parameter tuning, black-box optimization, Bayesian optimization, random embedding.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.