2017
DOI: 10.1016/j.simpat.2017.05.009
|View full text |Cite
|
Sign up to set email alerts
|

MERPSYS: An environment for simulation of parallel application execution on large scale HPC systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
23
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
4
2

Relationship

5
5

Authors

Journals

citations
Cited by 29 publications
(24 citation statements)
references
References 10 publications
0
23
0
Order By: Relevance
“…(8) Problem of finding best hardware configuration for a given problem and its implementation (CPU/GPU/ other accelerators/hybrid), considering relative performance of CPUs, GPUs, interconnects, etc. Certain environments such as MERPSYS [76] allow for simulation of parallel application execution using various hardwares including compute devices such as CPUs and GPUs but the process requires prior calibration on small systems and target applications. (9) Lack of standardized APIs for new technologies such as NVRAM in parallel computing.…”
Section: Challenges In Modern High-performance Computingmentioning
confidence: 99%
“…(8) Problem of finding best hardware configuration for a given problem and its implementation (CPU/GPU/ other accelerators/hybrid), considering relative performance of CPUs, GPUs, interconnects, etc. Certain environments such as MERPSYS [76] allow for simulation of parallel application execution using various hardwares including compute devices such as CPUs and GPUs but the process requires prior calibration on small systems and target applications. (9) Lack of standardized APIs for new technologies such as NVRAM in parallel computing.…”
Section: Challenges In Modern High-performance Computingmentioning
confidence: 99%
“…algorithm: ring, Rabenseifner, pre-reduced ring (PRR) and sorted linear tree (SLT); size: (of data vector) 128 K, 512 K, 1 M, 2 M, 4 M, 8 M of floats (4 bytes long); mode: (of process delay) one-late (where only one process is delayed by maxDelay) and rand-late (where all processes are delayed randomly up to maxDelay); maxDelay: (of processes arrival times) 0, 1, 5, 10, 50, 100, 500, 1000 ms; P : (number of processes/nodes) 4,6,8,10,12,16,20,24,28,32,36,40,44,48; N : (number of iterations) 64-256, depending on maxDelay (more for lower delay); Table 3 presents the results of the benchmark execution for 1 M of floats of reduced data, 1 Gbps Ethernet network, where only one process was delayed on 48 nodes in a cluster environment of Tryton [14] HPC computer. The results are presented as absolute values of average elapsed time:ē alg and speedup: s alg , in comparison with ring algorithm s alg =ē rinḡ e alg , where alg is the evaluated algorithm.…”
Section: Environment and Test Setupmentioning
confidence: 99%
“…• analysis of the trade-off to find out potential points where values for measures incorporating execution time and energy used would be optimal for a specific application, • benchmarking other applications, especially those that take more power from our testbed systems, • power-aware modeling of compute devices in frameworks for simulation of application runs in high performance computing environments such as MERPSYS [23], • development of a tool for automatic detection of the optimal power settings for the aforementioned time-energy measures using historical data (e.g. via machine learning), • proposing a new method for minimizing the electrical energy usage dynamically at runtime for various HPC/cloud workloads [24].…”
Section: Final Remarks and Future Workmentioning
confidence: 99%