A Simulator for Large-Scale Parallel Computer Architectures

Adalsteinsson, Helgi; Cranford, Scott; Evensky, David A.; Kenny, Joseph P.; Mayo, Jackson; Pınar, Ali; Janssen, Curtis L.

doi:10.4018/jdst.2010040104

Cited by 94 publications

(48 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To analyze the effects of integrating sPIN in next-generation networks, we extend the Cray Slingshot Simulator, which models a 200 Gib/s NIC (see Fig. 1) in Sandia Structural Simulation Toolkit (SST) [34], adding packet processing capabilities by implementing sPIN. We configure the network simulator to send 2KiB of payload data.…”

Section: Simulation Setupmentioning

confidence: 99%

Network-accelerated non-contiguous memory transfers

Girolamo

Taranov

Kurth

et al. 2019

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

Applications often communicate data that is non-contiguous in the send-or the receive-buffer, e.g., when exchanging a column of a matrix stored in row-major order. While non-contiguous transfers are well supported in HPC (e.g., MPI derived datatypes), they can still be up to 5x slower than contiguous transfers of the same size. As we enter the era of network acceleration, we need to investigate which tasks to offload to the NIC: In this work we argue that non-contiguous memory transfers can be transparently networkaccelerated, truly achieving zero-copy communications. We implement and extend sPIN, a packet streaming processor, within a Portals 4 NIC SST model, and evaluate strategies for NIC-offloaded processing of MPI datatypes, ranging from datatype-specific handlers to general solutions for any MPI datatype. We demonstrate up to 10x speedup in the unpack throughput of real applications, demonstrating that non-contiguous memory transfers are a first-class candidate for network acceleration.

show abstract

Section: Simulation Setupmentioning

confidence: 99%

Network-accelerated non-contiguous memory transfers

Girolamo

Taranov

Kurth

et al. 2019

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

show abstract

“…Here, the application is executed (emulated) on top of a simulator that is responsible to determine when each process is run. This approach allows researchers to study directly the behavior of MPI applications but only a few recent simulators such as SST Macro [12], SimGrid/SMPI [1] and the closed-source xSim [13] support it. To the best of our knowledge, only SST Macro and SimGrid/SMPI are mature enough to faithfully emulate HPL.…”

Section: Related Workmentioning

confidence: 99%

Fast and Faithful Performance Prediction of MPI Applications: the HPL Case Study

Cornebize

Legrand

Heinrich

2019

2019 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

“…In addition, many multiprocessor simulators include an on-chip network, or support integration with a dedicated on-chip network simulator [8,20,30].…”

Section: Related Workmentioning

confidence: 99%

OpenSoC Fabric

Fatollahi-Fard

Donofrio

Shalf

2014

Proceedings of the 2014 International Workshop on Network on Chip Architectures

View full text Add to dashboard Cite

Recent advancements in technology scaling have sparked a trend towards greater integration with large-scale chips containing thousands of processors connected to memories and other I/O devices using non-trivial network topologies. Software simulation suffers from long execution times or reduced accuracy in such complex systems, whereas hardware RTL development is too time-consuming. We present OpenSoC Fabric, a parameterizable and powerful on-chip network generator for evaluating future large-scape chip multiprocessors and SoCs. OpenSoC Fabric leverages a new hardware DSL, Chisel, which contains powerful abstractions provided by its base language, Scala, and generates both software (C++) and hardware (Verilog) models from a single code base. This is in contrast to other tools readily available which typically provide either software or hardware models, but not both. The OpenSoC Fabric infrastructure is modeled after existing state-of-the-art simulators, offers large and powerful collections of configuration options, is open-source, and uses object-oriented design and functional programming to make functionality extension as easy as possible.

show abstract

A Simulator for Large-Scale Parallel Computer Architectures

Cited by 94 publications

References 16 publications

Network-accelerated non-contiguous memory transfers

Network-accelerated non-contiguous memory transfers

Fast and Faithful Performance Prediction of MPI Applications: the HPL Case Study

OpenSoC Fabric

Contact Info

Product

Resources

About