We propose an instrumental variable (IV) selection procedure which combines the agglomerative hierarchical clustering method and the Hansen-Sargan overidentification test for selecting valid instruments for IV estimation from a large set of candidate instruments. Some of the instruments may be invalid in the sense that they may fail the exclusion restriction.We show that under the plurality rule, our method can achieve oracle selection and estimation results. Compared to the previous IV selection methods, our method has the advantages that it can deal with the weak instruments problem effectively, and can be easily extended to settings where there are multiple endogenous regressors and heterogenous treatment effects. We conduct Monte Carlo simulations to examine the performance of our method, and compare it with two existing methods, the Hard Thresholding method (HT) and the Confidence Interval method (CIM). The simulation results show that our method achieves oracle selection and estimation results in both single and multiple endogenous regressors settings in large samples when all the instruments are strong. Also, our method works well when some of the candidate instruments are weak, outperforming HT and CIM. We apply our method to the estimation of the effect of immigration on wages in the US.
Mendelian randomization (MR) is an epidemiological approach that uses genetic variants as instrumental variables for estimating the causal effect of a modifiable but likely confounded exposure on an outcome. Standard MR usually assumes that all included genetic variants are valid instruments and there is a single homogeneous causal effect of the exposure on the outcome. We allow violations of both assumptions such that the variants can be divided into clusters identifying distinct causal effects driven by different biological mechanisms and/or horizontal pleiotropy. We adapted the Agglomerative Hierarchical Clustering (AHC) method developed for individual-level data to the summary data MR setting, enabling the detection of such variant clusters using only genome-wide summary statistics. We also extend the method to handle two outcomes and a common exposure to aid investigation of the mechanisms of multimorbidity. We conduct Monte Carlo simulations to evaluate the performance of our `MR-AHC' algorithm compared to the existing MR-Clust method, showing that it is both reliable and computationally efficient: it detects variant clusters with high accuracy and is much faster than MR-Clust. In an applied example, we use our method to analyze the causal effects of high body fat percentage on a pair of well-known multimorbid conditions, type 2 diabetes and osteoarthritis, discovering distinct variant clusters reflecting the heterogeneous shared pathways.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.