Computing Optimal Assignments in Linear Time for Approximate Graph Matching

Kriege, Nils M.; Giscard, Pierre-Louis; Bause, Franka; Wilson, Richard C.

doi:10.1109/icdm.2019.00045

Cited by 19 publications

(50 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A method for approximate nearest neighbor search regarding the Wasserstein distance has been proposed recently [1]. Another line of work studies special cases, which allow vector space embeddings, e.g., in the domain of kernels for structured data [15,19,17]. On that basis we develop embeddings of novel assignment-based lower bounds for the graph edit distance, which are e ective and allow index-accelerated similarity search.…”

Section: Discussionmentioning

confidence: 99%

“…On large graphs, these methods are not feasible and approximations are used [26,8,28,17]. These can be obtained from the exact approaches, e.g., using beam search or linear programming relaxations.…”

Section: Pairwise Computation Of the Graph Edit Distancementioning

confidence: 99%

“…The running time was further reduced by de ning ground costs for the assignment problem that are a tree metric [17]. This allows computing an optimal assignment in linear time by associating elements to the nodes of the tree and matching them in a bottom-up fashion.…”

Section: Pairwise Computation Of the Graph Edit Distancementioning

confidence: 99%

“…Moreover, similarity search is the fundamental problem when using the graph edit distance in downstream supervised or unsupervised machine learning methods such as 𝑘-nearest neighbors classi cation. Promising results have been reported for classifying graphs from diverse applications representing, e.g., small molecules [17], petroglyphs [32], or cuneiform signs [16]. However, this approach does not readily scale to large datasets, where embedding-based methods such as graph kernels [18] and graph neural networks [39] have become the dominating techniques.…”

Section: Introductionmentioning

confidence: 99%

“…Algorithms for exact [13,20,7,8] or approximate [26,28,17] graph edit distance computation have been extensively studied. They are typically optimized for pairwise comparison but can be accelerated in cases when a distance cuto is given as part of the input.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

EmbAssi: Embedding Assignment Costs for Similarity Search in Large Graph Databases

Bause,

Schubert,

Kriege

2021

Preprint

Self Cite

View full text Add to dashboard Cite

The graph edit distance is an intuitive measure to quantify the dissimilarity of graphs, but its computation is NP-hard and challenging in practice. We introduce methods for answering nearest neighbor and range queries regarding this distance e ciently for large databases with up to millions of graphs. We build on the lter-veri cation paradigm, where lower and upper bounds are used to reduce the number of exact computations of the graph edit distance. Highly e ective bounds for this involve solving a linear assignment problem for each graph in the database, which is prohibitive in massive datasets. Index-based approaches typically provide only weak bounds leading to high computational costs veri cation. In this work, we derive novel lower bounds for e cient ltering from restricted assignment problems, where the cost function is a tree metric. This special case allows embedding the costs of optimal assignments isometrically into ℓ 1 space, rendering e cient indexing possible. We propose several lower bounds of the graph edit distance obtained from tree metrics re ecting the edit costs, which are combined for e ective ltering. Our method termed EmbAssi can be integrated into existing lter-veri cation pipelines as a fast and e ective pre-ltering step. Empirically we show that for many realworld graphs our lower bounds are already close to the exact graph edit distance, while our index construction and search scales to very large databases.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Pairwise Computation Of the Graph Edit Distancementioning

confidence: 99%

Section: Pairwise Computation Of the Graph Edit Distancementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

EmbAssi: Embedding Assignment Costs for Similarity Search in Large Graph Databases

Bause,

Schubert,

Kriege

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Mining Tree Patterns with Partially Injective Homomorphisms

Schulz

Horváth

Welke

et al. 2019

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Learning on graphs, particularly graph classification, requires rich graph representations. A common paradigm to obtain these is by extracting sets of substructures and representing graphs by such sets. The obtained graph representations then enable the application of standard machine learning approaches like support vector machines. Traditionally, graph substructures refer to subgraph patterns which are embedded by subgraph isomorphisms. Identifying subgraph patterns is however computationally infeasible due to the NP-completeness of deciding subgraph isomorphism even when the patterns are restricted to trees. A relaxation of the problem is to consider graph homomorphisms as the pattern matching operator instead, which can in fact be computed in polynomial time for tree patterns. However, graph homomorphisms generally result in less suitable graph representations for classification tasks. A key observation, which has been largely disregarded in the machine learning community, is that subgraph isomorphisms can be regarded as constrained homomorphisms. In this dissertation, we utilize this unifying view of these two pattern embedding operators by considering tractable instances of constrained homomorphisms on tree patterns and design three powerful and efficiently computable graph kernels.To bridge the gap between graph homomorphisms and subgraph isomorphisms, we first introduce the notion of partially injective homomorphisms which require injectivity only for subsets of the patterns' vertex pairs. Utilizing positive complexity results on deciding homomorphisms from bounded treewidth graphs, we present an algorithm mining frequent trees w.r.t. partially injective homomorphisms in incremental polynomial time. We design a kernel function which measures graph similarity in terms of such mutually occurring patterns and experimentally demonstrate that by bridging the gap between graph homomorphism and subgraph isomorphism, our approach offers an attractive trade-off between efficiency and predictive power.Subsequently, we turn our attention to the popular Weisfeiler-Lehman method. This label propagation algorithm implicitly constructs tree patterns for which the embedding operator is given by locally bijective homomorphisms, another kind of constrained homomorphisms. While such patterns can be very efficiently computed and yield expressive graph representations, comparing graphs in terms of mutually occurring Weisfeiler-Lehman patterns is an often insufficient similarity measure. We propose two approaches to overcome this drawback.Utilizing the concept of graph filtrations, we introduce a graph kernel which compares distributions of Weisfeiler-Lehman patterns over multiple graph resolutions. This approach offers a fine-grained graph similarity by comparing existence intervals of patterns, instead of their cardinalities. We show that this kernel is powerful in terms of distinguishing non-isomorphic graphs and even gives rise to complete graph kernels in certain scenarios. Moreover, the kernel can be generalized to arbit...

show abstract