Performance studies of high-speed communication on commodity cluster

Tam, Tat-chun; 譚達俊,

doi:10.5353/th_b3124364

Cited by 2 publications

(1 citation statement)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Evaluating the impact of communication switching time using both MPI and TCP/IP sockets shows that: (1) during communication between two nodes, if a third node tries to send data to either one, the interruption may dramatically reduce transfer performance, and (2) assuming a constant communication data size, increasing the number of receiving nodes has a considerable negative effect on communication time. To address these issues, we have designed communication schedules [21] that reduce the likelihood of interruptions and simplify the communication pattern of the parallel LBM simulation. This pattern is illustrated in figure 7(b) and involves transfers between independent axial pairs of nodes in multiple steps.…”

Section: Optimization Of Inter-gpu Communicationmentioning

confidence: 99%

Implementing the lattice Boltzmann model on commodity graphics hardware

Kaufman¹,

Fan²,

Petkov³

2009

J. Stat. Mech.

View full text Add to dashboard Cite

Modern graphics processing units (GPUs) can perform generalpurpose computations in addition to the native specialized graphics operations. Due to the highly parallel nature of graphics processing, the GPU has evolved into a many-core coprocessor that supports high data parallelism. Its performance has been growing at a rate of squared Moore's law, and its peak floating point performance exceeds that of the CPU by an order of magnitude. Therefore, it is a viable platform for time-sensitive and computationally intensive applications. The lattice Boltzmann model (LBM) computations are carried out via linear operations at discrete lattice sites, which can be implemented efficiently using a GPU-based architecture. Our simulations produce results comparable to the CPU version while improving performance by an order of magnitude. We have demonstrated that the GPU is well suited for interactive simulations in many applications, including simulating fire, smoke, lightweight objects in wind, jellyfish swimming in water, and heat shimmering and mirage (using the hybrid thermal LBM). We further advocate the use of a GPU cluster for large scale LBM simulations and for high performance computing. The Stony Brook Visual Computing Cluster has been the platform for several applications, including simulations of real-time plume dispersion in complex urban environments and thermal fluid dynamics in a pressurized water reactor. Major GPU vendors have been targeting the high performance computing market with GPU hardware implementations. Software toolkits such as NVIDIA CUDA provide a convenient development platform that abstracts the GPU and allows access to its underlying stream computing architecture. However, software programming for a GPU cluster remains a challenging task. We have therefore developed the Zippy framework to simplify GPU cluster programming. Zippy

show abstract

Section: Optimization Of Inter-gpu Communicationmentioning

confidence: 99%