Over 5000 publications on parallel discrete event simulation (PDES) have appeared in the literature to date. Nevertheless, few articles have focused on empirical studies of PDES performance on large supercomputer-based systems. This gap is bridged here, by undertaking a parameterized performance study on thousands of processor cores of a Blue Gene supercomputing system. In contrast to theoretical insights from analytical studies, our study is based on actual implementation in software, incurring the actual messaging and computational overheads for both conservative and optimistic synchronization approaches of PDES. Complex and counter-intuitive effects are uncovered and analyzed, with different event timestamp distributions and available levels of concurrency in the synthetic benchmark models. The results are intended to provide guidance to the PDES community in terms of how the synchronization protocols behave at high processor core counts using a state-of-the-art supercomputing systems. INTRODUCTIONA parallel discrete-event simulation (PDES) system consists of a collection of logical processes or LPs, each modeling a distinct component of the system being modeled (e.g., router in a physical communications network). LPs communicate by exchanging timestamped event messages (e.g., denoting the arrival of a new job at that server). The goal of PDES is to efficiently process all events in parallel in global timestamp order. Two well-established approaches towards this goal are broadly called conservative processing and optimistic processing.The seminal parallel discrete-event processing approach, falling under the category of conservative processing, is the Null Message algorithm developed by Chandy and Misra (Chandy and Misra 1979) and Bryant (Bryant 1977) (also known as the CMB algorithm, based on the inventors' names). In this algorithm, each logical process sends a "null message" to its neighboring processes upon executing each event. The "null message" contains a timestamp T that serves as a "promise" that the sending process will not later send a message with timestamp smaller than T to the receiving process. At any process, if the next local event to be processed has a timestamp that is greater than any of the received "null message" events, that process must wait until it receives the next wave of "null messages" such that all "null message" timestamps are greater than the timestamp of the next event to be processed. It has been shown that this algorithm avoids deadlock provided that there is no cycle in which a message could traverse without incrementing its timestamp (i.e., timestamp increments must be non-zero).Alternatives to the Null Message algorithm for conservative execution employ a "global synchronization" approach. Promiment examples arethe Bounded Lag algorithm (Lubachevsky, Shwartz, and Weiss 1991), Time Buckets (Steinman 1993), YAWNS (Dickens et al. 1996, and more recently Composite Synchronization (Nicol and Liu 2002). In its simplest approach, each LP is allowed to process events between the ...
ROSS.Net brings together the four major areas of networking research: network modeling, simulation, measurement and protocol design. ROSS.Net is a tool for computing large scale design of experiments through components such 3s a discrete-event simulation engine, default and extensible model designs, and a state of the art XML interface. ROSS.Net reads in predefined descriptions of network topologies and traffic scenarios which allows for in-depth analysis and insight into emerging feature interactions, cascading failures and protocol stability in a variety of situations. Developers will be able to design and implement their own protocol designs, network topologies and modeling scenarios, as well as implement existing platforms within the ROSS.Net platform. Also using ROSS.Net, designers are able to create experiments with varying levels of granularity, allowing for the highest-degree of scalability.
Recently, Time Warp has shown that it achieves good strong scaling to hundreds of thousands of processors on modern supercomputer systems. These results were achieved on the Cray and IBM Blue Gene supercomputing platforms. In this paper, we investigate the ROSS Time Warp cache memory performance on (i) a commodity shared-memory desktop system based on the Intel E5504 processor and (ii) the IBM Blue Gene/L when configured to run over the standard Message Passing Interface (MPI) library.
We present a performance analysis for a highly accurate, large-scale electromagnetic wave propagation model on two modern supercomputing platforms: the Cray XT5 and the IBM Blue Gene/L. The electromagnetic wave model is used to simulate the physical layer of a large-scale mobile ad-hoc network of radio devices. The model is based on the numerical technique called Transmission Line Matrix, and is implemented in a Time Warp simulation package that employs reverse computation for the rollback mechanism. Using Rensselaer's Optimistic Simulation System we demonstrate better than real-time, scalable parallel performance for network scenarios containing up to one million mobile radio devices, highly accurate RF propagation and high resolution, large-scale complex terrain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.