Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2015
DOI: 10.1145/2783258.2783363
|View full text |Cite
|
Sign up to set email alerts
|

Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing

Abstract: We present a novel algorithm, Westfall-Young light, for detecting patterns, such as itemsets and subgraphs, which are statistically significantly enriched in one of two classes. Our method corrects rigorously for multiple hypothesis testing and correlations between patterns through the WestfallYoung permutation procedure, which empirically estimates the null distribution of pattern frequencies in each class via permutations.In our experiments, Westfall-Young light dramatically outperforms the current state-of-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
51
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
6
3

Relationship

2
7

Authors

Journals

citations
Cited by 55 publications
(51 citation statements)
references
References 38 publications
(50 reference statements)
0
51
0
Order By: Relevance
“…Other methods: While our work focuses on deriving information theoretic FS criteria that capture high-order feature interactions, there are studies in the literature that provide answers from other perspectives. For example, there is a recent group of works for significance pattern mining (Terada et al 2013;Llinares-López et al 2015;Papaxanthos et al 2016): finding groups of items (i.e. features) that occur statistically significant more often in one class than in the other, and rigorously controlling the family-wise error rate (FWER).…”
Section: Background On Feature Selectionmentioning
confidence: 99%
“…Other methods: While our work focuses on deriving information theoretic FS criteria that capture high-order feature interactions, there are studies in the literature that provide answers from other perspectives. For example, there is a recent group of works for significance pattern mining (Terada et al 2013;Llinares-López et al 2015;Papaxanthos et al 2016): finding groups of items (i.e. features) that occur statistically significant more often in one class than in the other, and rigorously controlling the family-wise error rate (FWER).…”
Section: Background On Feature Selectionmentioning
confidence: 99%
“…In our setting, and due to the fact that all possible regions are considered, the computational considerations make it extremely challenging to naively apply permutation testing. Nevertheless, combining with the approach proposed in Llinares-López et al (2015b ), which uses Tarone’s method as a way to speed-up permutation testing, would be an interesting topic for future work. Enhancing with permutation testing would also have additional benefits, such as taking into account the dependence between test statistics to obtain less stringent significance thresholds, thereby increasing statistical power.…”
Section: Discussionmentioning
confidence: 99%
“…Tarone’s trick has been recently applied to (i) itemset mining ( Terada et al , 2013 ; Minato et al , 2014 ; Llinares-López et al , 2015b ), (ii) subgraph mining ( Sugiyama et al , 2015 ), and (iii) to mine associated genomic regions with the previously mentioned algorithm ( Llinares-López et al , 2015a ). However, none of these methods were able to incorporate covariates to correct for confounding.…”
Section: Problem Statementmentioning
confidence: 99%
“…Especially in the life sciences, however, it is essential to determine whether a detected pattern is also statistically significant within a particular dataset or class. This is the basic premise of significant pattern mining (SPM) algorithms, which have already been successfully employed in itemset mining and subgraph mining tasks ( Llinares-López et al , 2015 ). Recent work by Papaxanthos et al (2016) demonstrated their applicability in genome-wide association mapping.…”
Section: Statistical Shapelet Miningmentioning
confidence: 99%