SUMMARYThe goal of this study is to adapt the multiscale fluid solver EULerian or LAGrangian framewrok (EULAG) to future graphics processing units (GPU) platforms. The EULAG model has the proven record of successful applications, and excellent efficiency and scalability on conventional supercomputer architectures. Currently, the model is being implemented as the new dynamical core of the COSMO weather prediction framework. Within this study, two main modules of EULAG, namely the multidimensional positive definite advection transport algorithm (MPDATA) and the variational generalized conjugate residual, elliptic pressure solver Generalized Conjugate Residual (GCR) are analyzed and optimized. In this paper, a method is proposed, which ensures a comprehensive analysis of the resource consumption including registers, shared, and global memories. This method allows us to identify bottlenecks of the algorithm, including data transfers between host and global memory, global and shared memories, as well as GPU occupancy. We put the emphasis on providing a fixed memory access pattern, padding as well as organizing computation in the MPDATA algorithm. The testing and validation of the new GPU implementation have been carried out based on modeling decaying turbulence of a homogeneous incompressible fluid in a triply-periodic cube. Simulations performed using the standard version of EULAG and its new GPU implementation give similar solutions. Preliminary results show a promising increase in terms of computational efficiency.
Validation, verification, and uncertainty quantification (VVUQ) of simulation workflows are essential for building trust in simulation results, and their increased use in decision‐making processes. The EasyVVUQ Python library is designed to facilitate implementation of advanced VVUQ techniques in new or existing workflows, with a particular focus on high‐performance computing, middleware agnosticism, and multiscale modeling. Here, the application of EasyVVUQ to five very diverse application areas is demonstrated: materials properties, ocean circulation modeling, fusion reactors, forced human migration, and urban air quality prediction.
Multiscale simulations are an essential computational method in a range of research disciplines, and provide unprecedented levels of scientific insight at a tractable cost in terms of effort and compute resources. To provide this, we need such simulations to produce results that are both robust and actionable. The VECMA toolkit (VEC-MAtk), which is officially released in conjunction with the present paper, establishes a platform to achieve this by exposing patterns for verification, validation and uncertainty quantification (VVUQ). These patterns can be combined to capture complex scenarios, applied to applications in disparate domains, and used to run multiscale simulations on any desktop, cluster or supercomputing platform.
Abstract. The Discrete Wavelet Transform (DWT) has gained the momentum in signal processing and image compression over the last decade bringing the concept up to the level of new image coding standard JPEG2000. Thanks to many added values in DWT, in particular inherent multi-resolution nature, wavelet-coding schemes are suitable for various applications where scalability and tolerable degradation are relevant. Moreover, as we demonstrate in this paper, it can be used as a perfect benchmarking procedure for more sophisticated data compression and multimedia applications using General Purpose Graphical Processor Units (GPGPUs). Thus, in this paper we show and compare experiments performed on reference implementations of DWT on Cell Broadband Engine Architecture (Cell B.E) and nVidia Graphical Processing Units (GPUs). The achieved results show clearly that although both GPU and Cell B.E. are being considered as representatives of the same hybrid architecture devices class they di er greatly in programming style and optimization techniques that need to be taken into account during the development. In order to show the speedup, the parallel algorithm has been compared to sequential computation performed on the x86 architecture.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.