The version presented here may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher's version. Please see the repository url above for details on accessing the published version and note that access may require a subscription. Geochemical maps provide invaluable evidence to guide decisions on issues of mineral exploration, 9 agriculture, and environmental health. However, the high cost of chemical analysis means that the 10 ground sampling density will always be limited. Traditionally, geochemical maps have been 11 produced through the interpolation of measured element concentrations between sample sites 12 using models based on the spatial autocorrelation of data (e.g semivariogram models for ordinary 13 kriging). In their simplest form such models fail to consider potentially useful auxiliary information 14 about the region and the accuracy of the maps may suffer as a result. In contrast, this study uses 15 quantile regression forests (an elaboration of random forest) to investigate the potential of high 16 resolution auxiliary information alone to support the generation of accurate and interpretable 17 geochemical maps. This paper presents a summary of the performance of quantile regression forests 18 in predicting element concentrations, loss on ignition and pH in the soils of south west England using 19 high resolution remote sensing and geophysical survey data. 20Through stratified 10-fold cross validation we find the accuracy of quantile regression forests in 21 predicting soil geochemistry in south west England to be a general improvement over that offered 22 by ordinary kriging. Concentrations of immobile elements whose distributions are most tightly 23 controlled by bedrock lithology are predicted with the greatest accuracy (e.g. Al with a 24 cross-validated R 2 of 0.79), while concentrations of more mobile elements prove harder to predict. 25In addition to providing a high level of prediction accuracy, models built on high resolution auxiliary 26 variables allow for informative, process based, interpretations to be made. In conclusion, this study 27 has highlighted the ability to map and understand the surface environment with greater accuracy 28 2 and detail than previously possible by combining information from multiple datasets. As the quality 29 and coverage of remote sensing and geophysical surveys continue to improve, machine learning 30 methods will provide a means to interpret the otherwise-uninterpretable. 31 32 33
The most mature aspect of applying artificial intelligence (AI)/machine learning (ML) to problems in the atmospheric sciences is likely post-processing of model output. This article provides some history and current state of the science of post-processing with AI for weather and climate models. Deriving from the discussion at the 2019 Oxford workshop on Machine Learning for Weather and Climate, this paper also presents thoughts on medium-term goals to advance such use of AI, which include assuring that algorithms are trustworthy and interpretable, adherence to FAIR data practices to promote usability, and development of techniques that leverage our physical knowledge of the atmosphere. The coauthors propose several actionable items and have initiated one of those: a repository for datasets from various real weather and climate problems that can be addressed using AI. Five such datasets are presented and permanently archived, together with Jupyter notebooks to process them and assess the results in comparison with a baseline technique. The coauthors invite the readers to test their own algorithms in comparison with the baseline and to archive their results. This article is part of the theme issue ‘Machine learning for weather and climate modelling’.
Forecasting the weather is an increasingly data-intensive exercise. Numerical weather prediction (NWP) models are becoming more complex, with higher resolutions, and there are increasing numbers of different models in operation. While the forecasting skill of NWP models continues to improve, the number and complexity of these models poses a new challenge for the operational meteorologist: how should the information from all available models, each with their own unique biases and limitations, be combined in order to provide stakeholders with well-calibrated probabilistic forecasts to use in decision making? In this paper, we use a road surface temperature example to demonstrate a three-stage framework that uses machine learning to bridge the gap between sets of separate forecasts from NWP models and the ‘ideal’ forecast for decision support: probabilities of future weather outcomes. First, we use quantile regression forests to learn the error profile of each numerical model, and use these to apply empirically derived probability distributions to forecasts. Second, we combine these probabilistic forecasts using quantile averaging. Third, we interpolate between the aggregate quantiles in order to generate a full predictive distribution, which we demonstrate has properties suitable for decision support. Our results suggest that this approach provides an effective and operationally viable framework for the cohesive post-processing of weather forecasts across multiple models and lead times to produce a well-calibrated probabilistic output. This article is part of the theme issue ‘Machine learning for weather and climate modelling’.
Predictive mapping of indoor radon potential often requires the use of additional datasets. A range of geological, geochemical and geophysical data may be considered, either individually or in combination. The present work is an evaluation of how much of the indoor radon variation in south west England can be explained by four different datasets: a) the geology (G), b) the airborne gamma-ray spectroscopy (AGR), c) the geochemistry of topsoil (TSG) and d) the geochemistry of stream sediments (SSG). The study area was chosen since it provides a large (197,464) indoor radon dataset in association with the above information. Geology provides information on the distribution of the materials that may contribute to radon release while the latter three items provide more direct observations on the distributions of the radionuclide elements uranium (U), thorium (Th) and potassium (K). In addition, (c) and (d) provide multi-element assessments of geochemistry which are also included in this study. The effectiveness of datasets for predicting the existing indoor radon data is assessed through the level (the higher the better) of explained variation (% of variance or ANOVA) obtained from the tested models. A multiple linear regression using a compositional data (CODA) approach is carried out to obtain the required measure of determination for each analysis. Results show that, amongst the four tested datasets, the soil geochemistry (TSG, i.e. including all the available 41 elements, 10 major - Al, Ca, Fe, K, Mg, Mn, Na, P, Si, Ti - plus 31 trace) provides the highest explained variation of indoor radon (about 40%); more than double the value provided by U alone (ca. 15%), or the sub composition U, Th, K (ca. 16%) from the same TSG data. The remaining three datasets provide values ranging from about 27% to 32.5%. The enhanced prediction of the AGR model relative to the U, Th, K in soils suggests that the AGR signal captures more than just the U, Th and K content in the soil. The best result is obtained by including the soil geochemistry with geology and AGR (TSG + G + AGR, ca. 47%). However, adding G and AGR to the TSG model only slightly improves the prediction (ca. +7%), suggesting that the geochemistry of soils already contain most of the information given by geology and airborne datasets together, at least with regard to the explanation of indoor radon. From the present analysis performed in the SW of England, it may be concluded that each one of the four datasets is likely to be useful for radon mapping purposes, whether alone or in combination with others. The present work also suggest that the complete soil geochemistry dataset (TSG) is more effective for indoor radon modelling than using just the U (+Th, K) concentration in soil.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.