Cosmological simulators are an important component in the study of the formation of galaxies and large scale structures, and can help answer many important questions about the universe. Despite their utility, existing parallel simulators do not scale effectively on modern machines containing thousands of processors. In this paper we present ChaNGa, a recently released production simulator based on the Charm++ infrastructure. To achieve scalable performance, ChaNGa employs various optimizations that maximize the overlap between computation and communication. We present experimental results of ChaNGa simulations on machines with thousands of processors, including the IBM Blue Gene/L and the Cray XT3. The paper goes on to highlight efforts toward even more efficient and scalable cosmological simulations. In particular, novel load balancing schemes that base decisions on certain characteristics of tree-based particle codes are discussed. Further, the multistepping capabilities of ChaNGa are presented, as are solutions to the load imbalance that such multiphase simulations face. We outline key requirements for an effective practical implementation and conclude by discussing preliminary results from simulations run with our multiphase load balancer.
Abstract-This paper focuses on the use of GPGPU-based clusters for hierarchical N -body simulations. Whereas the behavior of these hierarchical methods has been studied in the past on CPU-based architectures, we investigate key performance issues in the context of clusters of GPUs. These include kernel organization and efficiency, the balance between tree traversal and force computation work, grain size selection through the tuning of offloaded work request sizes, and the reduction of sequential bottlenecks. The effects of various application parameters are studied and experiments done to quantify gains in performance. Our studies are carried out in the context of a production-quality parallel cosmological simulator called ChaNGa. We highlight the re-engineering of the application to make it more suitable for GPU-based environments. Finally, we present performance results from experiments on the NCSA Lincoln GPU cluster, including a note on GPU use in multistepped simulations.
Abstract-Cloud computing is emerging as an alternative to supercomputers for some of the high-performance computing (HPC) applications that do not require a fully dedicated machine. With cloud as an additional deployment option, HPC users are faced with the challenges of dealing with highly heterogeneous resources, where the variability spans across a wide range of processor configurations, interconnections, virtualization environments, and pricing rates and models.In this paper, we take a holistic viewpoint to answer the question -why and who should choose cloud for HPC, for what applications, and how should cloud be used for HPC? To this end, we perform a comprehensive performance evaluation and analysis of a set of benchmarks and complex HPC applications on a range of platforms, varying from supercomputers to clouds. Further, we demonstrate HPC performance improvements in cloud using alternative lightweight virtualization mechanisms -thin VMs and OS-level containers, and hypervisor-and application-level CPU affinity. Next, we analyze the economic aspects and business models for HPC in clouds. We believe that is an important area that has not been sufficiently addressed by past research. Overall results indicate that current public clouds are cost-effective only at small scale for the chosen HPC applications, when considered in isolation, but can complement supercomputers using business models such as cloud burst and application-aware mapping.
This paper presents a scheme to optimize the mapping of HPC applications to a set of hybrid dedicated and cloud resources. First, we characterize application performance on dedicated clusters and cloud to obtain application signatures. Then, we propose an algorithm to match these signatures to resources such that performance is maximized and cost is minimized. Finally, we show simulation results revealing that in a concrete scenario our proposed scheme reduces the cost by 60% at only 10-15% performance penalty vs. a non optimized configuration. We also find that the execution overhead in cloud can be minimized to a negligible level using thin hypervisors or OS-level containers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.