2012
DOI: 10.1007/978-3-642-30397-5_10
|View full text |Cite
|
Sign up to set email alerts
|

A GPU Algorithm for Greedy Graph Matching

Abstract: Abstract. Greedy graph matching provides us with a fast way to coarsen a graph during graph partitioning. Direct algorithms on the CPU which perform such greedy matchings are simple and fast, but offer few handholds for parallelisation. To remedy this, we introduce a finegrained shared-memory parallel algorithm for maximal greedy matching, together with an implementation on the GPU, which is faster (speedups up to 6.8 for random matching and 5.6 for weighted matching) than the serial CPU algorithms and produce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2013
2013
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 27 publications
(13 citation statements)
references
References 17 publications
0
13
0
Order By: Relevance
“…However, the work on greedy matchings has to a large extent been driven by a need for developing scalable parallel algorithms for use in scientific applications. This has lead to the implementation of Gale-Shapley and McVitie-Wilson type matching algorithms on a large variety of architectures, including distributed memory machines [3,17], multicore computers [10,14,18], and GPUs [1,20].…”
Section: Methodsmentioning
confidence: 99%
“…However, the work on greedy matchings has to a large extent been driven by a need for developing scalable parallel algorithms for use in scientific applications. This has lead to the implementation of Gale-Shapley and McVitie-Wilson type matching algorithms on a large variety of architectures, including distributed memory machines [3,17], multicore computers [10,14,18], and GPUs [1,20].…”
Section: Methodsmentioning
confidence: 99%
“…We compare implementations of local max, the red-blue algorithm from [6] (RBM) (their implementation), heavy edge matching (HEM) [8], greedy, and the global path algorithm (GPA) [17]. HEM iterates through the nodes (optionally in random order) and matches the heaviest incident edge that is nonadjacent to a previously matched edge.…”
Section: Implementations and Experimentsmentioning
confidence: 99%
“…As a basis for our implementation we use back40computing library by Merrill [19]. Figure 5 compares the running time of our implementation with GPA, sequential local max, the RBM algorithm parallelized for 4 cores, and its GPU parallelization from [6]. While the CPU implementation has troubles recovering from its sequential inefficiency and is only slightly faster than even sequential local max, the GPU implementation is impressively fast in particular for small graphs.…”
Section: Gpu Implementationmentioning
confidence: 99%
“…2, we will also look at parallelising the other parts of the algorithm. We generate matchings µ on the GPU using the algorithm from [7], where we perform weighted matching with edge weight 2 Ω ω({u, v}) − ζ(u) ζ(v) (cf. eq.…”
Section: Parallelisation Of the Remainder Of Algmentioning
confidence: 99%
“…We present a fine-grained shared-memory parallel algorithm for graph coarsening and apply this algorithm in the context of graph clustering to obtain a fast greedy heuristic for maximising modularity in weighted undirected graphs. This is a follow-up to [7], which was concerned with generating weighted graph matchings on the GPU, in an effort to use the parallel processing power offered by multicore CPUs and GPUs for discrete computing tasks, such as partitioning and clustering of graphs and hypergraphs. Just as generating graph matchings, graph coarsening is an essential aspect of both graph partitioning [4,8,11] and multilevel clustering [21] and therefore forms a logical continuation of the research done in [7].…”
Section: Introductionmentioning
confidence: 99%