Mining statistically significant connected subgraphs in vertex labeled graphs

Arora, Akhil; Sachan, Mayank; Bhattacharya, Arnab

doi:10.1145/2588555.2588574

Cited by 14 publications

(8 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is not a trivial question as the search space in graph mining is often exponentially larger than that in itemset mining due to combinations of vertices and edges. In this paper, we give a positive answer to this question by (1) extending the approach by Terada et al [32] to solve the important open problem of significant subgraph mining with multiple testing correction via frequent subgraph mining [7,15,23,40], (2) proposing efficient search strategies for detecting testable subgraphs, one of which is empirically orders of magnitude faster than their method, and (3) further improving over naïve Bonferroni correction by considering the dependence between subgraph occurrences [22,24].…”

Section: Introductionmentioning

confidence: 99%

Significant Subgraph Mining with Multiple Testing Correction

Sugiyama¹,

Llinares-López²,

Kasenburg³

et al. 2015

Proceedings of the 2015 SIAM International Conference on Data Mining

View full text Add to dashboard Cite

The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant itemset mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the standard Bonferroni correction on real-world datasets. An open question, however, is whether this strategy of excluding untestable hypotheses also leads to greater statistical power in subgraph mining, in which the number of hypotheses is much larger than in itemset mining. Here we answer this question by an empirical investigation on eight popular graph benchmark datasets. We propose a new efficient search strategy, which always returns the same solution as the stateof-the-art approach and is approximately two orders of magnitude faster. Moreover, we exploit the dependence between subgraphs by considering the effective number of tests and thereby further increase the statistical power.

show abstract

Section: Introductionmentioning

confidence: 99%

Significant Subgraph Mining with Multiple Testing Correction

Sugiyama¹,

Llinares-López²,

Kasenburg³

et al. 2015

Proceedings of the 2015 SIAM International Conference on Data Mining

View full text Add to dashboard Cite

show abstract

“…However, when the data is noisy and considered to be a random sample from the population of interest, it is desired to provide statistical significance measures such as p-values or confidence intervals for each of the discovered patterns. Although several researchers in data mining community studied how to compute statistical significances of the discovered patterns [11,12,13,14], the reported p-values in these studies are biased in the sense that the selection effect of the mining algorithms are not taken into account (unless a multiple testing correction procedure is applied to these p-values afterward).…”

Section: Related Approachesmentioning

confidence: 99%

Selective Inference Approach for Statistically Sound Predictive Pattern Mining

Suzumura¹,

Nakagawa²,

Sugiyama³

et al. 2016

Preprint

View full text Add to dashboard Cite

Discovering statistically significant patterns from databases is an important challenging problem. The main obstacle of this problem is in the difficulty of taking into account the selection bias, i.e., the bias arising from the fact that patterns are selected from extremely large number of candidates in databases.In this paper, we introduce a new approach for predictive pattern mining problems that can address the selection bias issue. Our approach is built on a recently popularized statistical inference framework called selective inference. In selective inference, statistical inferences (such as statistical hypothesis testing) are conducted based on sampling distributions conditional on a selection event. If the selection event is characterized in a tractable way, statistical inferences can be made without minding selection bias issue.However, in pattern mining problems, it is difficult to characterize the entire selection process of mining algorithms. Our main contribution in this paper is to solve this challenging problem for a class of predictive pattern mining problems by introducing a novel algorithmic framework. We demonstrate that our approach is useful for finding statistically significant patterns from databases.

show abstract

“…Rather, identifying the statistically significant attribute associations where the pattern of the attribute association deviates from the expected, can potentially infer undiscovered possible relationships between nodes in the graph. The statistical significance of a pattern has been emphasized in various data mining problems [12], [13], [7], [14], [15] and the previous works already explored why a statistically significant pattern is more important rather than a frequent pattern. Thus, in this paper we define a statistically significant attribute association and address the problem of uncovering it in attributed graphs.…”

Section: Introductionmentioning

confidence: 99%

Mining Statistically Significant Attribute Associations in Attributed Graphs

Lee

Park²,

Prabhakar

2016

2016 IEEE 16th International Conference on Data Mining (ICDM)

View full text Add to dashboard Cite

Recently, graphs have been widely used to represent many different kinds of real world data or observations such as social networks, protein-protein networks, road networks, and so on. In many cases, each node in a graph is associated with a set of its attributes and it is critical to not only consider the link structure of a graph but also use the attribute information to achieve more meaningful results in various graph mining tasks. Most previous works with attributed graphs take into account attribute relationships only between individually connected nodes. However, it should be greatly valuable to find out which sets of attributes are associated with each other and whether they are statistically significant or not. Mining such significant associations, we can uncover novel relationships among the sets of attributes in the graph. We propose an algorithm that can find those attribute associations efficiently and effectively, and show experimental results that confirm the high applicability of the proposed algorithm.

show abstract

Mining statistically significant connected subgraphs in vertex labeled graphs

Cited by 14 publications

References 30 publications

Significant Subgraph Mining with Multiple Testing Correction

Significant Subgraph Mining with Multiple Testing Correction

Selective Inference Approach for Statistically Sound Predictive Pattern Mining

Mining Statistically Significant Attribute Associations in Attributed Graphs

Contact Info

Product

Resources

About