Telling Cause from Effect Using MDL-Based Local and Global Regression

Vreeken, Jilles

doi:10.1109/icdm.2017.40

Cited by 23 publications

(22 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our key contributions can be summarized as follows, we (a) show how to model unobserved mechanisms via compound deterministic and nondeterministic functions, (b) propose an MDL score for causal inference on pairs of univariate numeric random variables, (c) formulate two analytic significance tests based on compression, (d) introduce the linear-time algorithms Slope and Sloper, (e) give extensive empirical evaluation, including a case study (f) and make all code, data generators and data available. This paper builds upon and extends the work appearing in ICDM'17 [18]. Notably, we provide a link between the confidence and significance score of our method.…”

Section: Introductionmentioning

confidence: 70%

Telling cause from effect by local and global regression

Vreeken

2018

Knowl Inf Syst

Self Cite

View full text Add to dashboard Cite

We consider the problem of inferring the causal direction between two univariate numeric random variables X and Y from observational data. This case is especially challenging as the graph X causes Y is Markov equivalent to the graph Y causes X , and hence it is impossible to determine the correct direction using conditional independence tests. To tackle this problem, we follow an information theoretic approach based on the algorithmic Markov condition. This postulate states that in terms of Kolmogorov complexity the factorization given by the true causal model is the most succinct description of the joint distribution. This means that we can infer that X is a likely cause of Y when we need fewer bits to first transmit the data over X , and then the data of Y as a function of X , than for the inverse direction. That is, in this paper we perform causal inference by compression. To put this notion to practice, we employ the Minimum Description Length principle, and propose a score to determine how many bits we need to transmit the data using a class of regression functions that can model both local and global functional relations. To determine whether an inference, i.e. the difference in compressed sizes, is significant, we propose two analytical significance tests based on the no-hypercompression inequality. Last, but not least, we introduce the linear-time Slope and Sloper algorithms that through thorough empirical evaluation we show outperform the state of the art by a wide margin.

show abstract

Section: Introductionmentioning

confidence: 70%

Telling cause from effect by local and global regression

Vreeken

2018

Knowl Inf Syst

Self Cite

View full text Add to dashboard Cite

show abstract

“…Regarding regression analysis, Le et al [124] present the geometric-based online Gaussian process that could scale with massive datasets, guaranteeing that the proposed algorithm produces a good enough solution (close to the optimal one) and a fast-online regression. Marx and Vreeken [125] present an information theory-based approach using the Kolmogorov complexity and the principle of minimum description length to provide a practical solution to the problem of inferring the direction of causal dependence of observational data. Rudaś and Jaroszewicz [126] analyze two uplift modeling approaches for linear regression and identify the situations in which each model works best; in fact, they propose a third model that combines the benefits of both approaches.…”

Section: Recent Methods For General Applicationsmentioning

confidence: 99%

Predictive Data Mining Techniques for Fault Diagnosis of Electric Equipment: A Review

et al. 2020

View full text Add to dashboard Cite

Data mining is a technological and scientific field that, over the years, has been gaining more importance in many areas, attracting scientists, developers, and researchers around the world. The reason for this enthusiasm derives from the remarkable benefits of its usefulness, such as the exploitation of large databases and the use of the information extracted from them in an intelligent way through the analysis and discovery of knowledge. This document provides a review of the predictive data mining techniques used for the diagnosis and detection of faults in electric equipment, which constitutes the pillar of any industrialized country. Starting from the year 2000 to the present, a revision of the methods used in the tasks of classification and regression for the diagnosis of electric equipment is carried out. Current research on data mining techniques is also listed and discussed according to the results obtained by different authors.

show abstract

“…Under such representation, BNs are generally identifiable if the noises are non-Gaussian (Shimizu et al, 2006), if the functional form of the additive noise model is nonlinear (Hoyer et al, 2009;Zhang and Hyvärinen, 2009), or if the noise variances are equal (Peters and Bühlmann, 2014). Also see much of the recent literature that focuses on bivariate causal discovery (Mooij et al, 2010;Janzing et al, 2012;Chen et al, 2014;Sgouritsa et al, 2015;Hernandez-Lobato et al, 2016;Marx and Vreeken, 2017;Blöbaum et al, 2018;Marx and Vreeken, 2019;Tagasovska et al, 2020). For count data, Categorical Data.…”

Section: Related Workmentioning

confidence: 99%

Ordinal Causal Discovery

Yang¹,

Mallick²

2022

Preprint

View full text Add to dashboard Cite

Causal discovery for purely observational, categorical data is a long-standing challenging problem. Unlike continuous data, the vast majority of existing methods for categorical data focus on inferring the Markov equivalence class only, which leaves the direction of some causal relationships undetermined. This paper proposes an identifiable ordinal causal discovery method that exploits the ordinal information contained in many real-world applications to uniquely identify the causal structure. The proposed method is applicable beyond ordinal data via data discretization. Through real-world and synthetic experiments, we demonstrate that the proposed ordinal causal discovery method combined with simple score-and-search algorithms has favorable and robust performance compared to state-of-the-art alternative methods in both ordinal categorical and non-categorical data. An accompanied R package OCD is freely available at https://web.stat.tamu.edu/ ~yni/files/OCD_0.1.0.tar.gz.

show abstract

Telling Cause from Effect Using MDL-Based Local and Global Regression

Cited by 23 publications

References 24 publications

Telling cause from effect by local and global regression

Telling cause from effect by local and global regression

Predictive Data Mining Techniques for Fault Diagnosis of Electric Equipment: A Review

Ordinal Causal Discovery

Contact Info

Product

Resources

About