Many common problems in epidemiologic and clinical research involve estimating the effect of an exposure on an outcome while blocking the exposure's effect on an intermediate variable. Effects of this kind are termed direct effects. Estimation of direct effects arises frequently in research aimed at understanding mechanistic pathways by which an exposure acts to cause or prevent disease, as well as in many other settings. Although multivariable regression is commonly used to estimate direct effects, this approach requires assumptions beyond those required for the estimation of total causal effects. In addition, multivariable regression estimates a particular type of direct effect, the effect of an exposure on outcome fixing the intermediate at a specified level. Using the counterfactual framework, we distinguish this definition of a direct effect (Type 1 direct effect) from an alternative definition, in which the effect of the exposure on the intermediate is blocked, but the intermediate is otherwise allowed to vary as it would in the absence of exposure (Type 2 direct effect). When the intermediate and exposure interact to affect the outcome these two types of direct effects address distinct research questions. Relying on examples, we illustrate the difference between Type 1 and Type 2 direct effects. We propose an estimation approach for Type 2 direct effects that can be implemented using standard statistical software and illustrate its implementation using a numerical example. We also review the assumptions underlying our approach, which are less restrictive than those proposed by previous authors.
van der Laan and Dudoit (2003) provide a road map for estimation and performance assessment where a parameter of interest is defined as the risk minimizer for a suitable loss function and candidate estimators are generated using a loss function. After briefly reviewing this approach, this article proposes a general deletion/substitution/addition algorithm for minimizing, over subsets of variables (e.g., basis functions), the empirical risk of subset-specific estimators of the parameter of interest. This algorithm provides us with a new class of loss-based cross-validated algorithms in prediction of univariate outcomes, which can be extended to handle multivariate outcomes, conditional density and hazard estimation, and censored outcomes such as survival. In the context of regression, using polynomial basis functions, we study the properties of the deletion/substitution/addition algorithm in simulations and apply the method to detect transcription factor binding sites in yeast gene expression experiments.
Many alternative data-adaptive algorithms can be used to learn a predictor based on observed data. Examples of such learners include decision trees, neural networks, support vector regression, least angle regression, logic regression, and the Deletion/Substitution/Addition algorithm. The optimal learner for prediction will vary depending on the underlying data-generating distribution. In this article we introduce the "super learner", a prediction algorithm that applies any set of candidate learners and uses cross-validation to select between them. Theory shows that asymptotically the super learner performs essentially as well as or better than any of the candidate learners. In this article we present the theory behind the super learner, and illustrate its performance using simulations. We further apply the super learner to a data example, in which we predict the phenotypic antiretroviral susceptibility of HIV based on viral genotype. Specifically, we apply the super learner to predict susceptibility to a specific protease inhibitor, nelfinavir, using a set of database-derived non-polymorphic treatment-selected mutations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.