Some high performance computing (HPC) applications exhibit increasing real-time requirements, which call for effective means to predict their high execution times distribution. This is a new challenge for HPC applications but a well-known problem for real-time embedded applications where solutions already exist, although they target low-performance systems running single-threaded applications. In this paper, we show how some performance validation and measurement-based practices for real-time execution time prediction can be leveraged in the context of HPC applications on high-performance platforms, thus enabling reliable means to obtain real-time guarantees for those applications. In particular, the proposed methodology uses coordinately techniques that randomly explore potential timing behavior of the application together with Extreme Value Theory (EVT) to predict rare (and high) execution times to, eventually, derive probabilistic Worst-Case Execution Time (pWCET) curves. We demonstrate the effectiveness of this approach for an acoustic wave inversion application used for geophysical exploration.Mathematics 2020, 8, 314 2 of 21 modeling the propagation of hazardous substances after an accident, needs to be completed in a given short time frame in order to be useful.
MotivationSo far, timing guarantees have been mostly of interest for embedded systems with some form of criticality, such as those in avionics, automotive, space, industrial processes, and so forth. Therefore, technology to estimate execution time bounds already exists. However, target systems for such technology are often much simpler than those in HPC, and execution conditions are also far more controlled, with single-threaded applications statically scheduled in general [6]. Hence, whether such technology fits the specific requirements of HPC systems has not yet been studied and suitable techniques offering sufficient scalability and flexibility need to be identified and used appropriately.Software timing analysis technologies have been mostly investigated in the real-time domain. Some approaches target static modelling and abstract interpretation of the program execution of the system on a model of the hardware [7]. Those approaches have been proven suitable for simple systems with complete and accurate documentation of the timing of the platform, and of the execution flow-facts of the software (e.g., loop bounds). However, they have a number of limitations that make them ill-advised for complex systems [8]. Alternative approaches resorting to measurements rather than to static models have shown higher acceptance in industrial environments due to their ease to fit their problem [9]. However, increasingly complex systems bring increasing uncertainty on the execution conditions coverage of the tests used for timing modelling [8].As an alternative, a set of technologies based on a combination of platform control, data collection protocols and black-box statistical analysis have become popular in the last decade [10]. These technologies aim to prov...