Geomagnetic indices are convenient quantities that distill the complicated physics of some region or aspect of near‐Earth space into a single parameter. Most of the best‐known indices are calculated from ground‐based magnetometer data sets, such as Dst, SYM‐H, Kp, AE, AL, and PC. Many models have been created that predict the values of these indices, often using solar wind measurements upstream from Earth as the input variables to the calculation. This document reviews the current state of models that predict geomagnetic indices and the methods used to assess their ability to reproduce the target index time series. These existing methods are synthesized into a baseline collection of metrics for benchmarking a new or updated geomagnetic index prediction model. These methods fall into two categories: (1) fit performance metrics such as root‐mean‐square error and mean absolute error that are applied to a time series comparison of model output and observations and (2) event detection performance metrics such as Heidke Skill Score and probability of detection that are derived from a contingency table that compares model and observation values exceeding (or not) a threshold value. A few examples of codes being used with this set of metrics are presented, and other aspects of metrics assessment best practices, limitations, and uncertainties are discussed, including several caveats to consider when using geomagnetic indices.
Solar flares are extremely energetic phenomena in our Solar System. Their impulsive, often drastic radiative increases, in particular at short wavelengths, bring immediate impacts that motivate solar physics and space weather research to understand solar flares to the point of being able to forecast them. As data and algorithms improve dramatically, questions must be asked concerning how well the forecasting performs; crucially, we must ask how to rigorously measure performance in order to critically gauge any improvements. Building upon earlier-developed methodology (Barnes et al. 2016, Paper I), international representatives of regional warning centers and research facilities assembled in 2017 at the Institute for Space-Earth Environmental Research, Nagoya University, Japan to -for the first time -directly compare the performance of operational solar flare forecasting methods. Multiple quantitative evaluation metrics are employed, with focus and discussion on evaluation methodologies given the restrictions of operational forecasting. Numerous methods performed consistently above the "no skill" level, although which method scored top marks is decisively a function of flare event definition and the metric used; there was no single winner. Following in this paper series we ask why the performances differ by examining implementation details (Leka et al. 2019, Paper III), and then we present a novel analysis method to evaluate temporal patterns of forecasting errors in (Park et al. 2019, Paper IV). With these works, this team presents a well-defined and robust methodology for evaluating solar flare forecasting methods in both research and operational frameworks, and today's performance benchmarks against which improvements and new methods may be compared.
The Met Office Space Weather Operations Centre produces 24/7/365 space weather guidance, alerts, and forecasts to a wide range of government and commercial end‐users across the United Kingdom. Solar flare forecasts are one of its products, which are issued multiple times a day in two forms: forecasts for each active region on the solar disk over the next 24 h and full‐disk forecasts for the next 4 days. Here the forecasting process is described in detail, as well as first verification of archived forecasts using methods commonly used in operational weather prediction. Real‐time verification available for operational flare forecasting use is also described. The influence of human forecasters is highlighted, with human‐edited forecasts outperforming original model results and forecasting skill decreasing over longer forecast lead times.
Accurate forecasting of the arrival time and subsequent geomagnetic impacts of coronal mass ejections (CMEs) at Earth is an important objective for space weather forecasting agencies. Recently, the CME Arrival and Impact working team has made significant progress toward defining community-agreed metrics and validation methods to assess the current state of CME modeling capabilities. This will allow the community to quantify our current capabilities and track progress in models over time. First, it is crucial that the community focuses on the collection of the necessary metadata for transparency and reproducibility of results. Concerning CME arrival and impact we have identified six different metadata types: 3-D CME measurement, model description, model input, CME (non)arrival observation, model output data, and metrics and validation methods. Second, the working team has also identified a validation time period, where all events within the following two periods will be considered: 1
to quantitatively compare the performance of today's operational solar flare forecasting facilities. Building upon Paper I of this series (Barnes et al. 2016), in Paper II (Leka et al. 2019 we described the participating methods for this latest comparison effort, the evaluation methodology, and presented quantitative comparisons. In this paper we focus on the behavior and performance of the methods when evaluated in the context of broad implementation differences. Acknowledging the short testing interval available and the small number of methods available, we do find that forecast performance: 1) appears to improve by including persistence or prior flare activity, region evolution, and a human "forecaster in the loop"; 2) is hurt by restricting data to disk-center observations; 3) may benefit from long-term statistics, but mostly when then combined with modern data sources and statistical approaches. These trends are arguably weak and must be viewed with numerous caveats, as discussed both here and in Paper II. Following this present work, we present in Paper IV a novel analysis method to evaluate temporal patterns of forecasting errors of both types (i.e., misses and false alarms; Park et al. 2019). Hence, most importantly, with this series of papers we demonstrate the techniques for facilitating comparisons in the interest of establishing performance-positive methodologies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.