We present an analysis of the performance aspects of an atmospheric general circulation model at the ultrahigh resolution required to resolve individual cloud systems and describe alternative technological paths to realize the integration of such a model in the relatively near future. Due to a superlinear scaling of the computational burden dictated by the Courant stability criterion, the solution of the equations of motion dominate the calculation at these ultra-high resolutions. From this extrapolation, it is estimated that a credible kilometer scale atmospheric model would require a sustained computational rate of at least 28 Petaflop/s to provide scientifically useful climate simulations. Our design study portends an alternate strategy for practical power-efficient implementations of next-generation ultra-scale systems. We demonstrate that hardware/software co-design of low-power embedded processor technology could be exploited to design a custom machine tailored to ultra-high resolution climate model specifications at relatively affordable cost and power considerations. A strawman machine design is presented consisting of in excess of 20 million processing elements that effectively exploits forthcoming many-core chips. The system pushes the limits of domain decomposition to increase explicit parallelism, and suggests that functional partitioning of sub-components of the climate code (much like the coarse-grained partitioning of computation between the atmospheric, ocean, land, and ice components of current coupled models) may be necessary for future performance scaling.
Science and Technology Facilities Council preprints are available online at: http://epubs.stfc.ac.uk ABSTRACT In the context of the block Cimmino algorithm, we study preprocessing strategies to obtain block partitionings that can be applied to general linear systems of equations Ax = b. We study strategies that transform the matrix AA T into a matrix with a block tridiagonal structure. This provides a partitioning of the linear system for row projection methods because block Cimmino is essentially equivalent to block Jacobi on the normal equations and the resulting partition will yield a two-block partition of the original matrix. Therefore the resulting block partitioning should improve the rate of convergence of block row projection methods such as block Cimmino. We discuss a way of obtaining a partitioning using a dropping strategy that gives more blocks at the cost of relaxing the two-block partitioning. We then use a hypergraph partitioning that works directly on the matrix A to reduce directly the connections between blocks. We give numerical results showing the performance of these techniques both in their effect on the convergence of the block Cimmino algorithm and in their ability to exploit parallelism.
The ACTS Collection brings together a number of general-purpose computational tools that were developed by independent research projects mostly funded and supported by the U.S. Department of Energy. These tools tackle a number of common computational issues found in many applications, mainly implementation of numerical algorithms, and support for code development, execution and optimization. In this article, we introduce the numerical tools in the collection and their functionalities, present a model for developing more complex computational applications on top of ACTS tools, and summarize applications that use these tools. Lastly, we present a vision of the ACTS project for deployment of the ACTS Collection by the computational sciences community.
Abstract. This paper presents a toolkit for managing distributed communication in multi-application systems that are targeted to run in high performance computing environments; the Distributed Data Broker (DDB). The DDB provides a flexible mechanism for coupling codes with different grid resolutions and data representations. The target applications are coupled systems that deal with large volumes of data exchanges and/or are computational expensive. These application codes need to run efficiently in massively parallel computer environments generating a need for a distributed coupling to minimize long synchronization points. Furthermore, with the DDB, coupling is realized in a plug-in manner rather than hard-wire inclusion of any programming language statements. The DDB performance in the CRAY T3E-600 and T3E-900 systems is examined
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.