A fundamental difficulty of causal learning is that causal models can generally not be fully identified based on observational data only. Interventional data, that is, data originating from different experimental environments, improves identifiability. However, the improvement depends critically on the target and nature of the interventions carried out in each experiment. Since in real applications experiments tend to be costly, there is a need to perform the right interventions such that as few as possible are required.In this work we propose a new active learning (i.e. experiment selection) framework (A-ICP) based on Invariant Causal Prediction (ICP) [27]. For general structural causal models, we characterize the effect of interventions on so-called stable sets, a notion introduced by [30]. We leverage these results to propose several intervention selection policies for A-ICP which quickly reveal the direct causes of a response variable in the causal graph while maintaining the error control inherent in ICP. Empirically, we analyze the performance of the proposed policies in both population and finite-regime experiments.
Causal inference is understood to be a very challenging problem with observational data alone. Without making additional strong assumptions, it is only typically possible given access to data arising from perturbing the underlying system. To identify causal relations among a collections of covariates and a target or response variable, existing procedures rely on at least one of the following assumptions: i) the target variable remains unperturbed, ii) the hidden variables remain unperturbed, and iii) the hidden effects are dense. In this paper, we consider a perturbation model for interventional data (involving soft and hard interventions) over a collection of Gaussian variables that does not satisfy any of these conditions and can be viewed as a mixed-effects linear structural causal model. We propose a maximum-likelihood estimator -dubbed DirectLikelihood -that exploits system-wide invariances to uniquely identify the population causal structure from perturbation data. Our theoretical guarantees also carry over to settings where the variables are non-Gaussian but are generated according to a linear structural causal model. Further, we demonstrate that the population causal parameters are solutions to a worst-case risk with respect to distributional shifts from a certain perturbation class. We illustrate the utility of our perturbation model and the DirectLikelihood estimator on synthetic data as well as real data involving protein expressions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.