Proceedings of the 2015 SIAM International Conference on Data Mining 2015
DOI: 10.1137/1.9781611974010.53
|View full text |Cite
|
Sign up to set email alerts
|

Scaling log-linear analysis to datasets with thousands of variables

Abstract: Association discovery is a fundamental data mining task. The primary statistical approach to association discovery between variables is log-linear analysis. Classical approaches to log-linear analysis do not scale beyond about ten variables. We have recently shown that, if we ensure that the graph supporting the log-linear model is chordal, log-linear analysis can be applied to datasets with hundreds of variables without sacrificing the statistical soundness [21]. However, further scalability remained limited,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 12 publications
(13 citation statements)
references
References 16 publications
0
12
0
1
Order By: Relevance
“…We carry out further experiments on another real dataset: "Finance stock performance of the companies" used in [34], which contains 20 years financial performance of 490 companies. The number of samples in the dataset is 3450, where the financial footprints in individual days are considered as samples.…”
Section: Real-life Data Experiments On Finance Stock Performance Of Thmentioning
confidence: 99%
See 1 more Smart Citation
“…We carry out further experiments on another real dataset: "Finance stock performance of the companies" used in [34], which contains 20 years financial performance of 490 companies. The number of samples in the dataset is 3450, where the financial footprints in individual days are considered as samples.…”
Section: Real-life Data Experiments On Finance Stock Performance Of Thmentioning
confidence: 99%
“…the number of variables describing each sampled data point. For example, one may be interested in recovering the gene interaction network over large number of genes based on only a handful of collected gene expression samples [33], or recovering the associations of stock performance in a set of companies based on large number of samples [34]. These problem scenarios make association discovery challenging due to different reasons.…”
Section: Introductionmentioning
confidence: 99%
“…Hence, it is not required to check the candidature and to compute the scoring function of all candidate edges at every step. According to [32], the addition of an edge (a, b) to the candidate model affects the minimal separators between following node pairs: (a) a and the neighbors of b, (b) b and the neighbors of a, and (c) neighbors of a and b. So that we only recompute the candidature and scoring function of above-mentioned node pairs (i.e., edges) which takes O(|V|) times.…”
Section: Scalable Contchordalysis-mmlmentioning
confidence: 99%
“…Prioritized Chordalysis [9] mengetahui bahwa untuk garis (a, b) yang sama, skornya juga sama, kecuali separator minimal Sab berbeda. Dengan demikian, pada suatu iterasi, suatu garis butuh dihitung kembali skornya apabila separator minimalnya berubah.…”
Section: B Prioritized Chordalysisunclassified