2017 IEEE International Conference on Data Mining (ICDM) 2017
DOI: 10.1109/icdm.2017.40
|View full text |Cite
|
Sign up to set email alerts
|

Telling Cause from Effect Using MDL-Based Local and Global Regression

Abstract: We consider the fundamental problem of inferring the causal direction between two univariate numeric random variables X and Y from observational data. The two-variable case is especially difficult to solve since it is not possible to use standard conditional independence tests between the variables.To tackle this problem, we follow an information theoretic approach based on Kolmogorov complexity and use the Minimum Description Length (MDL) principle to provide a practical solution. In particular, we propose a … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
22
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 23 publications
(22 citation statements)
references
References 24 publications
0
22
0
Order By: Relevance
“…Our key contributions can be summarized as follows, we (a) show how to model unobserved mechanisms via compound deterministic and nondeterministic functions, (b) propose an MDL score for causal inference on pairs of univariate numeric random variables, (c) formulate two analytic significance tests based on compression, (d) introduce the linear-time algorithms Slope and Sloper, (e) give extensive empirical evaluation, including a case study (f) and make all code, data generators and data available. This paper builds upon and extends the work appearing in ICDM'17 [18]. Notably, we provide a link between the confidence and significance score of our method.…”
Section: Introductionmentioning
confidence: 70%
“…Our key contributions can be summarized as follows, we (a) show how to model unobserved mechanisms via compound deterministic and nondeterministic functions, (b) propose an MDL score for causal inference on pairs of univariate numeric random variables, (c) formulate two analytic significance tests based on compression, (d) introduce the linear-time algorithms Slope and Sloper, (e) give extensive empirical evaluation, including a case study (f) and make all code, data generators and data available. This paper builds upon and extends the work appearing in ICDM'17 [18]. Notably, we provide a link between the confidence and significance score of our method.…”
Section: Introductionmentioning
confidence: 70%
“…Regarding regression analysis, Le et al [124] present the geometric-based online Gaussian process that could scale with massive datasets, guaranteeing that the proposed algorithm produces a good enough solution (close to the optimal one) and a fast-online regression. Marx and Vreeken [125] present an information theory-based approach using the Kolmogorov complexity and the principle of minimum description length to provide a practical solution to the problem of inferring the direction of causal dependence of observational data. Rudaś and Jaroszewicz [126] analyze two uplift modeling approaches for linear regression and identify the situations in which each model works best; in fact, they propose a third model that combines the benefits of both approaches.…”
Section: Recent Methods For General Applicationsmentioning
confidence: 99%
“…Under such representation, BNs are generally identifiable if the noises are non-Gaussian (Shimizu et al, 2006), if the functional form of the additive noise model is nonlinear (Hoyer et al, 2009;Zhang and Hyvärinen, 2009), or if the noise variances are equal (Peters and Bühlmann, 2014). Also see much of the recent literature that focuses on bivariate causal discovery (Mooij et al, 2010;Janzing et al, 2012;Chen et al, 2014;Sgouritsa et al, 2015;Hernandez-Lobato et al, 2016;Marx and Vreeken, 2017;Blöbaum et al, 2018;Marx and Vreeken, 2019;Tagasovska et al, 2020). For count data, Categorical Data.…”
Section: Related Workmentioning
confidence: 99%