One of the scientific communities that generate the largest amounts of data today are the climate sciences. New climate models enable model integration at unprecedented resolution, simulating decades and centuries of climate change, including many complex interactions in the Earth system, under different scenarios. Previously, the CPU intensive numerical integration's used to be the bottleneck. Nowadays, limited storage space and ever increasing model output is the bigger challenge. The number of variables stored for post-processing analysis has to be limited to keep the data amounts small. For this reason, we look at lossless compression of climate data to make better use of available storage space. More specifically, we investigate prediction-based data compression. In prediction-based compression, data is processed in a predefined sequence. A prediction is provided for each data point based on prior data in the sequence. We show that there is a significant dependence of the compression ratio on the chosen traversal method and the underlying spatiotemporal data model. We examine the influence of this structural dependency on compression algorithms and explore possibilities to retrieve this information to improve compression ratios. To do this, we introduce the concept of Information Spaces (IS), which helps improve the predictions made by individual predictors by nearly 10% on average. More importantly, the standard deviation of the compression results is decreased by over 20% on average. The use of IS provides better predictions and more consistent compression ratios. Furthermore, it allows options for consolidation and fine-granular tuning of predictions, which are not possible with many common approaches used today.
Through the introduction of next-generation models the climate sciences have experienced a breakthrough in high-resolution simulations. In the past, the bottleneck was the numerical complexity of the models, nowadays it is the required storage space for the model output. One way to tackle the data storage challenge is through data compression.In this article we introduce a modular framework for the compression of structured climate data. Our modular framework supports the creation of individual predictors, which can be customised and adjusted to the data at hand. We provide a framework for creating interfaces and customising components, which are building blocks of individualised compression modules that are optimised for particular applications. Furthermore, the framework provides additional features such as the execution of benchmarks and validity tests for sequential as well as parallel execution of compression algorithms. CCS CONCEPTS• Software and its engineering → Software design engineering; Software design techniques; Reusability; Designing software;
Significant increases in computational resources have enabled the development of more complex and spatially better resolved weather and climate models. As a result the amount of output generated by data assimilation systems and by weather and climate simulations is rapidly increasing e.g. due to higher spatial resolution, more realisations and higher frequency data. However, while compute performance has increased significantly because of better scaling program code and increasing number of cores the storage capacity is only increasing slowly. One way to tackle the data storage problem is data compression.Here, we build the groundwork for an environmental data compressor by improving compression for established weather and climate indices like El Ni ño Southern Oscillation (ENSO), North Atlantic Oscillation (NAO) and Quasi-Biennial Oscillation (QBO). We investigate options for compressing these indices by using a statistical method based on the Auto Regressive Integrated Moving Average (ARIMA) model. The introduced adaptive approach shows that it is possible to improve accuracy of lossily compressed data by applying an adaptive compression method which preserves selected data with higher precision. Our analysis reveals no potential for lossless compression of these indices. However, as the ARIMA model is able to capture all relevant temporal variability, lossless compression is not necessary and lossy compression is acceptable. The reconstruction based on the lossily compressed data can reproduce the chosen indices to such a high degree that statistically relevant information needed for describing climate dynamics is preserved. The performance of the (seasonal) ARIMA model was tested with daily and monthly indices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.