We consider the problem of predicting a response variable from a set of covariates on a data set that differs in distribution from the training data. Causal parameters are optimal in terms of predictive accuracy if in the new distribution either many variables are affected by interventions or only some variables are affected, but the perturbations are strong. If the training and test distributions differ by a shift, causal parameters might be too conservative to perform well on the above task. This motivates anchor regression, a method that makes use of exogenous variables to solve a relaxation of the ‘causal’ minimax problem by considering a modification of the least‐squares loss. The procedure naturally provides an interpolation between the solutions of ordinary least squares (OLS) and two‐stage least squares. We prove that the estimator satisfies predictive guarantees in terms of distributional robustness against shifts in a linear class; these guarantees are valid even if the instrumental variable assumptions are violated. If anchor regression and least squares provide the same answer (‘anchor stability’), we establish that OLS parameters are invariant under certain distributional changes. Anchor regression is shown empirically to improve replicability and protect against distributional shifts.
This work presents a multivariate methodology combining principal component analysis, the Mahalanobis distance and decision trees for the selection of process factors and their levels in early process development of generic molecules. It is applied to a high throughput study testing more than 200 conditions for the production of a biosimilar monoclonal antibody at microliter scale. The methodology provides the most important selection criteria for the process design in order to improve product quality towards the quality attributes of the originator molecule. Robustness of the selections is ensured by cross-validation of each analysis step. The concluded selections are then successfully validated with an external data set. Finally, the results are compared to those obtained with a widely used software revealing similarities and clear advantages of the presented methodology. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 33:181-191, 2017.
Causal inference is known to be very challenging when only observational data are available. Randomized experiments are often costly and impractical and in instrumental variable regression the number of instruments has to exceed the number of causal predictors. It was recently shown in that causal inference for the full model is possible when data from distinct observational environments are available, exploiting that the conditional distribution of a response variable is invariant under the correct causal model. Two shortcomings of such an approach are the high computational effort for large-scale data and the assumed absence of hidden confounders. Here we show that these two shortcomings can be addressed if one is willing to make a more restrictive assumption on the type of interventions that generate different environments. Thereby, we look at a different notion of invariance, namely inner-product invariance. By avoiding a computationally cumbersome reverse-engineering approach such as in Peters et al. [2016], it allows for large-scale causal inference in linear structural equation models. We discuss identifiability conditions for the causal parameter and derive asymptotic confidence intervals in the low-dimensional setting. In the case of non-identifiability we show that the solution set of causal Dantzig has predictive guarantees under certain interventions. We derive finite-sample bounds in the high-dimensional setting and investigate its performance on simulated datasets.MSC 2010 subject classifications: Primary 62J99, 62H99; secondary 68T99.
This paper analyzes how moral costs affect individual support of morally difficult group decisions. We study a threshold public good game with moral costs. Motivated by recent empirical findings, we assume that these costs are heterogeneous and consist of three parts.The first one is a standard cost term. The second, shared guilt, decreases in the number of supporters. The third hinges on the notion of being pivotal. We analyze equilibrium predictions, isolate the causal effects of guilt sharing, and compare results to standard utilitarian and nonconsequentialist approaches. As interventions, we study information release, feedback, and fostering individual moral standards.JEL Classification: D02, D03, D23, D63, D82.
We consider identifiability of partially linear additive structural equation models with Gaussian noise (PLSEMs) and estimation of distributionally equivalent models to a given PLSEM. Thereby, we also include robustness results for errors in the neighborhood of Gaussian distributions. Existing identifiability results in the framework of additive SEMs with Gaussian noise are limited to linear and nonlinear SEMs, which can be considered as special cases of PLSEMs with vanishing nonparametric or parametric part, respectively. We close the wide gap between these two special cases by providing a comprehensive theory of the identifiability of PLSEMs by means of (A) a graphical, (B) a transformational, (C) a functional and (D) a causal ordering characterization of PLSEMs that generate a given distribution P. In particular, the characterizations (C) and (D) answer the fundamental question to which extent nonlinear functions in additive SEMs with Gaussian noise restrict the set of potential causal models and hence influence the identifiability.On the basis of the transformational characterization (B) we provide a score-based estimation procedure that outputs the graphical representation (A) of the distribution equivalence class of a given PLSEM. We derive its (high-dimensional) consistency and demonstrate its performance on simulated datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.