Background. Mendelian randomization (MR) uses genetic variants as instrumental variables to estimate the causal effect of risk exposures in epidemiology. Two-sample summary-data MR that uses publicly available genome-wide association studies (GWAS) summary data have become a popular design in practice. With the sample size of GWAS continuing to increase, it is now possible to utilize genetic instruments that are only weakly associated with the exposure.Methods. To maximize the statistical power of MR, we propose a genome-wide design where more than a thousand genetic instruments are used. For the statistical analysis, we use an empirical partially Bayes approach where instruments are weighted according to their true strength, thus weak instruments bring less variation to the estimator. The final estimator is highly efficient in the presence of many weak genetic instruments and is robust to balanced and/or sparse pleiotropy.Results. We apply our method to estimate the causal effect of body mass index (BMI) and major blood lipids on cardiovascular disease outcomes. Compared to previous MR studies, we obtain much more precise causal effect estimates and substantially shorter confidence intervals. Some new and statistically significant findings are: the estimated causal odds ratio of BMI on ischemic stroke is 1.19 (95% CI:1.07-1.32, p-value ≤ 0.001); the estimated causal odds ratio of high-density lipoprotein cholesterol (HDL-C) on coronary artery disease (CAD) is 0.78 (95% CI 0.73-0.84, p-value ≤ 0.001). However, the estimated effect of HDL-C becomes substantially smaller and statistically non-significant when we only use the strong instruments.Conclusions. By employing a genome-wide design and robust statistical methods, the statistical power of MR studies can be greatly improved. Our empirical results suggest that, even though the relationship between HDL-C and CAD appears to be highly heterogeneous, it may be too soon to completely dismiss the HDL hypothesis. Further investigations are needed to demystify the observational and genetic associations between HDL-C and CAD.
In this paper we present several methods to identify precursors that show great promise for early predictions of solar flare events. A data preprocessing pipeline is built to extract useful data from multiple sources, Geostationary Operational Environmental Satellites and Solar Dynamics Observatory (SDO)/Helioseismic and Magnetic Imager (HMI), to prepare inputs for machine learning algorithms. Two classification models are presented: classification of flares from quiet times for active regions and classification of strong versus weak flare events. We adopt deep learning algorithms to capture both spatial and temporal information from HMI magnetogram data. Effective feature extraction and feature selection with raw magnetogram data using deep learning and statistical algorithms enable us to train classification models to achieve almost as good performance as using active region parameters provided in HMI/Space‐Weather HMI‐Active Region Patch (SHARP) data files. Case studies show a significant increase in the prediction score around 20 hr before strong solar flare events.
In this work, we develop gradient boosting machines (GBMs) for forecasting the SYM‐H index multiple hours ahead using different combinations of solar wind and interplanetary magnetic field (IMF) parameters, derived parameters, and past SYM‐H values. Using Shapley Additive Explanation values to quantify the contributions from each input to predictions of the SYM‐H index from GBMs, we show that our predictions are consistent with physical understanding while also providing insight into the complex relationship between the solar wind and Earth's ring current. In particular, we found that feature contributions vary depending on the storm phase. We also perform a direct comparison between GBMs and neural networks presented in prior publications for forecasting the SYM‐H index by training, validating, and testing them on the same data. We find that the GBMs yield a statistically significant improvement in root mean squared error over the best published black‐box neural network schemes and the Burton equation.
A deep learning network, long short-term memory (LSTM), is used to predict whether an active region (AR) will produce a flare of class Γ in the next 24 hr. We consider Γ to be ≥M (strong flare), ≥C (medium flare), and ≥A (any flare) class. The essence of using LSTM, which is a recurrent neural network, is its ability to capture temporal information on the data samples. The input features are time sequences of 20 magnetic parameters from the space weather Helioseismic and Magnetic Imager AR patches. We analyze ARs from 2010 June to 2018 December and their associated flares identified in the Geostationary Operational Environmental Satellite X-ray flare catalogs. Our results produce skill scores consistent with recently published results using LSTMs and are better than the previous results using a single time input. The skill scores from the model show statistically significant variation when different years of data are chosen for training and testing. In particular, 2015–2018 have better true skill statistic and Heidke skill scores for predicting ≥C medium flares than 2011–2014, when the difference in flare occurrence rates is properly taken into account.
We develop a mixed long short-term memory (LSTM) regression model to predict the maximum solar flare intensity within a 24-hr time window 0-24, 6-30, 12-36, and 24-48 hr ahead of time using 6, 12, 24, and 48 hr of data (predictors) for each Helioseismic and Magnetic Imager (HMI) Active Region Patch (HARP). The model makes use of ( 1) the Space-Weather HMI Active Region Patch (SHARP) parameters as predictors and (2) the exact flare intensities instead of class labels recorded in the Geostationary Operational Environmental Satellites (GOES) data set, which serves as the source of the response variables. Compared to solar flare classification, the model offers us more detailed information about the exact maximum flux level, that is, intensity, for each occurrence of a flare. We also consider classification models built on top of the regression model and obtain better results in solar flare classifications as compared to Chen et al. (2019, https://doi.org/10.1029/2019SW002214). Our results suggest that the most efficient time period for predicting the solar activity is within 24 hr before the prediction time using the SHARP parameters and the LSTM model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.