Nick Brown scite author profile

Concurrency and Computation

Weiland

Hill

et al. 2017

SummaryMONC is a highly scalable modelling tool for the investigation of atmospheric flows, turbulence, and cloud microphysics. Typical simulations produce very large amounts of raw data, which must then be analysed for scientific investigation. For performance and scalability reasons, this analysis and subsequent writing to disk should be performed in situ on the data as it is generated; however, one does not wish to pause the computation whilst analysis is carried out. In this paper, we present the analytics approach of MONC, where cores of a node are shared between computation and data analytics. By asynchronously sending their data to an analytics core, the computational cores can run continuously without having to pause for data writing or analysis. We describe our IO server framework and analytics workflow, which is highly asynchronous, along with solutions to challenges that this approach raises and the performance implications of some common configuration choices. The result of this work is a highly scalable analytics approach, and we illustrate on up to 32 768 computational cores of a Cray XC30 that there is minimal performance impact on the runtime when enabling data analytics in MONC and also investigate the performance and suitability of our approach on the KNL. convection scheme, 5,6 and cloud microphysics. 7,8 The simulations that these models run generate a significant amount of raw data; it is not this raw data itself that the scientists are most interested in but instead higher level information that results from analysis on this data. Previous generations of models, such as the LEM, which exhibited very limited parallel scalability, were able to perform this data analysis either by writing raw data to a file and analysing offline or by doing it in-line with the computation without much impact on performance. However, as modern models, such as MONC, open up the possibility of routinely running very large simulations on many thousands of cores, for performance and scalability, it is not possible to write this raw data to file and do analysis offline or stop the computation whilst analysis is performed in-line. This situation is likely to become more severe as we move towards exa-scale and run these models on hundreds of thousands of cores.In this paper, we introduce the data analysis framework approach and implementation that we have developed for MONC where, instead of computation, some cores of a processor run our IO server and are used for data analysis. The computation cores "fire and forget" their raw data to a corresponding IO server, which will then perform the analysis and any required IO. In order to promote this "fire and forget" approach, where computational cores can be kept busy doing their work, the IO server is highly asynchronous and has to deal with different data arriving at different times, which raises specific challenges. After discussing the context of MONC and related work by the community in more detail in Section2, Section 3 then focuses on our IO server, the analytics wo...

Exploring the Acceleration of the Met Office NERC Cloud Model Using FPGAs

2019

The use of Field Programmable Gate Arrays (FPGAs) to accelerate computational kernels has the potential to be great benefit to scientific codes and the HPC community in general. With the recent developments in FPGA programming technology, the ability to port kernels is becoming far more accessible. However, to gain reasonable performance from this technology it is not enough to simple transfer a code onto the FPGA, instead the algorithm must be rethought and recast in a dataflow style to suit the target architecture. In this paper we describe the porting, via HLS, of one of the most computationally intensive kernels of the Met Office NERC Cloud model (MONC), an atmospheric model used by climate and weather researchers, onto an FPGA. We describe in detail the steps taken to adapt the algorithm to make it suitable for the architecture and the impact this has on kernel performance. Using a PCIe mounted FPGA with on-board DRAM, we consider the integration on this kernel within a larger infrastructure and explore the performance characteristics of our approach in contrast to Intel CPUs that are popular in modern HPC machines, over problem sizes involving very large grids. The result of this work is an experience report detailing the challenges faced and lessons learnt in porting this complex computational kernel to FPGAs, as well as exploring the role that FPGAs can play and their fundamental limits in accelerating traditional HPC workloads.

Machine learning on Crays to optimize petrophysical workflows in oil and gas exploration

Concurrency and Computation

Roubickova

Lampaki

et al. 2020

Summary The oil and gas industry is awash with sub‐surface data, which is used to characterize the rock and fluid properties beneath the seabed. This drives commercial decision making and exploration, but the industry relies upon highly manual workflows when processing data. A question is whether this can be improved using machine learning, complementing the activities of petrophysicists searching for hydrocarbons. In this paper, we present work using supervised learning with the aim of decreasing the petrophysical interpretation time down from over 7 days to 7 minutes. We describe the use of mathematical models that have been trained using raw well log data, to complete each of the four stages of a petrophysical interpretation workflow, in addition to initial data cleaning. We explore how the predictions from these models compare against the interpretations of human petrophysicists, and numerous options and techniques that were used to optimize the models. The result of this work is the ability, for the first time, to use machine learning for the entire petrophysical workflow.

A directive based hybrid met office NERC cloud model

Lepper

Weiland

et al. 2015

Large Eddy Simulation is a critical modelling tool for the investigation of atmospheric flows, turbulence and cloud microphysics. The models used by the UK atmospheric research community are homogeneous and the latest model, MONC, is designed to run on substantial HPC systems with very high CPU core counts. In order to future proof these codes it is worth investigating other technologies and architectures which might support the communities running their codes at the exa-scale. In this paper we present a hybrid version of MONC, where the most computationally intensive aspect is offloaded to the GPU while the rest of the functionality runs concurrently on the CPU. Developed using the directive driven OpenACC, we consider the suitability and maturity of this technology to modern Fortran scientific codes as well general software engineering techniques which aid this type of porting work. The performance of our hybrid model at scale is compared against the CPU version before considering other tuning options and making a comparison between the energy usage of the homo-and heterogeneous versions. The result of this work is a promising hybrid model that shows performance benefits of our approach when the GPU has a significant computational workload which can not only be applied to the MONC model but also other weather and climate simulations in use by the community.

It's All About Data Movement: Optimising FPGA Data Access to Boost Performance

Dolman²

2019

The use of reconfigurable computing, and FPGAs in particular, to accelerate computational kernels has the potential to be of great benefit to scientific codes and the HPC community in general. However, whilst recent advanced in FPGA tooling have made the physical act of programming reconfigurable architectures much more accessible, in order to gain good performance the entire algorithm must be rethought and recast in a dataflow style. Reducing the cost of data movement for all computing devices is critically important, and in this paper we explore the most appropriate techniques for FPGAs. We do this by describing the optimisation of an existing FPGA implementation of an atmospheric model's advection scheme. By taking an FPGA code that was over four times slower than running on the CPU, mainly due to data movement overhead, we describe the profiling and optimisation strategies adopted to significantly reduce the runtime and bring the performance of our FPGA kernels to a much more practical level for real-world use. The result of this work is a set of techniques, steps, and lessons learnt that we have found significantly improves the performance of FPGA based HPC codes and that others can adopt in their own codes to achieve similar results.