In recent years, computational science and engineering (CSE) simulations using high-performance computing resources are actively exploited to solve complex domain-specific problems. Thanks to the remarkable advance of IT technology, the CSE community is challenging more complex and difficult problems than ever before, by running these simulations online. In this regard, we often witness that 1) online simulation users suffer from knowing little about the estimated termination time of their launched simulations and 2) the limited computing resources are squandered by wrong input that leads the simulations to run forever. To address such issues, we propose a novel execution time estimation scheme, termed EXTES, using machine learning techniques for more efficient online CSE simulations. With a large amount of existing provenance data, the EXTES scheme trains a suite of models rooted from classification, regression, and a hybrid of the two and utilize these models to estimate the execution time for specified input parameters for simulations. In the experiments on real simulation data, our proposed models achieved about 73% accuracy on average in execution time estimation across 16 simulation programs taken from a variety of CSE fields. In the meantime, the overhead incurred by the training and estimation is almost negligible.
Measuring execution time is one of the most used performance evaluation techniques in computer science research. Inaccurate measurements cannot be used for a fair performance comparison between programs. Despite the prevalence of its use, the intrinsic variability in the time measurement makes it hard to obtain repeatable and accurate timing results of a program running on an operating system. We propose a novel execution time measurement protocol (termed EMP) for measuring the execution time of a compute-bound program on Linux, while minimizing that measurement's variability. During the development of execution time measurement protocol, we identified several factors that disturb execution time measurement. We introduce successive refinements to the protocol by addressing each of these factors, in concert, reducing variability by more than an order of magnitude. We also introduce a new visualization technique, what we term 'dual-execution scatter plot' that highlights infrequent, long-running daemons, differentiating them from frequent and/or short-running daemons. Our empirical results show that the proposed protocol successfully achieves three major aspects-precision, accuracy, and scalability-in execution time measurement that can work for open-source and proprietary software. Figure 2. Execution time measurements of an 8-s compute-bound process. This protocol uses the novel device of a dual-execution scatter plot to highlight what we term L-samples, to identify within those samples infrequent, long-running daemons, and thus to determine in a disciplined way cutoffs to remove samples with such daemon executions. We examine how this protocol applies to proprietary as well as open-source programs.Figure 3. Process time versus elapsed time.We evaluate the performance of EMP by rigorous experiments, starting from a simple program in pure-computation mode to a popular CPU-bound benchmark suite, the SPEC benchmark [2]. Our empirical results strongly support the effectiveness and scalability of EMP.The following section discusses the accuracy and precision in execution time and then describes the timing mechanisms in Linux and their limitations. In Section 3, we introduce our EMP. Section 4 explicates factors and presents experimental results detailing the successive refinement of this protocol. The evaluation of the protocol continues with more realistic scenarios. We then review existing literature over the last 30+ years related to execution time measurement. We conclude by discussing future work, including an intriguing phenomenon that our refined protocol uncovered. BACKGROUNDThis section describes an overall background of timing a program on Linux. Specifically, we clarify accuracy and precision in timing and discuss the Linux timing mechanism and some limitations. Accuracy and precision in timingThe concepts of accuracy and precision in time measurement should be carefully differentiated. The accuracy of any measurement is the "closeness of agreement between a measured quantity value and a true quantity value...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.