Load sharing for optimistic parallel simulations on multi core machines

Proceedings of the 3rd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

Quaglia

2015

Self Cite

It is well known that Time Warp may suffer from large usage of memory, which may hamper the efficiency of the memory hierarchy. To cope with this issue, several approaches have been devised, mostly based on the reduction of the amount of used virtual memory, e.g., by the avoidance of checkpointing and the exploitation of reverse computing. In this article we present an orthogonal solution aimed at optimizing the latency for memory access operations when running Time Warp systems on Non-Uniform Memory Access (NUMA) multi-processor/multi-core computing systems. More in detail, we provide an innovative Linux-based architecture allowing per simulation-object management of memory segments made up by disjoint sets of pages, and supporting both static and dynamic binding of the memory pages reserved for an individual object to the different NUMA nodes, depending on what worker thread is in charge of running that simulation object along a given wall-clock-time window. Our proposal not only manages the virtual pages used for the live state image of the simulation object, rather, it also copes with memory pages destined to keep the simulation object's event buffers and any recoverability data. Further, the architecture allows memory access optimization for data (messages) exchanged across the different simulation objects running on the NUMA machine. Our proposal is fully transparent to the application code, thus operating in a seamless manner. Also, a free software release of our NUMA memory manager for Time Warp has been made available within the open source ROOT-Sim simulation platform. Experimental data for an assessment of our innovative proposal are also provided in this article

Section: Architectural Contextmentioning

confidence: 99%

“…for load-sharing purposes [36]) and when the worker threads are switched across CPU-cores operating in different NUMA nodes;…”

Section: The Page Migration Daemonmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

NUMA Time Warp

Proceedings of the 3rd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

Quaglia

2015

Self Cite

“…This can improve the usage of computing resources while carrying out speculative processing of DES models, by reducing negative effects of speculation, such as the rollback frequency. This is the objective of classical load balanging/sharing approaches proposed in literature (see, e.g., [19][20][21]). However, these proposal consider only explicit interactions supported via the classical event cross-scheduling approach.…”

Section: Introductionmentioning

confidence: 99%

“…In the context of PDES, several works have studied the problem of finding the best binding between LPs and worker threads-see, e.g., [16,25,26,21,27]. Nevertheless, none of these works has ever used information related to the interaction between LPs to explicitly reduce the (possible) negative effects of optimistic simulation runs.…”

Section: Introductionmentioning

confidence: 99%

Load-Sharing Policies in Parallel Simulation of Agent-Based Demographic Models

Euro-Par 2016: Parallel Processing Workshops

Montañola‐Sales²,

Quaglia

et al. 2017

Self Cite

Abstract. Execution parallelism in agent-Based Simulation (ABS) allows to deal with complex/large-scale models. This raises the need for runtime environments able to fully exploit hardware parallelism, while jointly offering ABS-suited programming abstractions. In this paper, we target last-generation Parallel Discrete Event Simulation (PDES) platforms for multicore systems. We discuss a programming model to support both implicit (in-place access) and explicit (message passing) interactions across concurrent Logical Processes (LPs). We discuss different load-sharing policies combining event rate and implicit/explicit LPs' interactions. We present a performance study conducted on a synthetic test case, representative of a class of agent-based models.

On power capping and performance optimization of multithreaded applications

Conoci

Sanzo

Concurrency and Computation

et al. 2021

Self Cite

Summary Multithreaded applications facilitate the exploitation of the computing power of multicore architectures. On the other hand, these applications can become extremely energy‐intensive, in contrast with the need for limiting the energy usage of computing systems. In this article, we explore the design of techniques enabling multithreaded applications to maximize their performance under a power cap. We consider two control parameters: the number of cores used by the application, and the core power state. We target the design of an autotuning power‐capping technique with minimal intrusiveness and high portability, which is agnostic about the workload profile of the application. We investigate two different approaches for building the strategy for selecting the best configuration of the parameters under control, namely a heuristic approach and a model‐based approach. Through an extensive experimental study, we evaluate the effectiveness of the proposed technique considering two different selection strategies, and we compare them with existing solutions.