2003
DOI: 10.1145/996546.996553
|View full text |Cite
|
Sign up to set email alerts
|

A blocked all-pairs shortest-paths algorithm

Abstract: We propose a blocked version of Floyd's all-pairs shortest-paths algorithm. The blocked algorithm makes better utilization of cache than does Floyd's original algorithm. Experiments indicate that the blocked algorithm delivers a speedup (relative to the unblocked Floyd's algorithm) between 1.6 and 1.9 on a Sun Ultra Enterprise 4000/5000 for graphs that have between 480 and 3200 vertices. The measured speedup on an SGI O2 for graphs with between 240 and 1200 vertices is between 1.6 and 2.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
33
0
6

Year Published

2008
2008
2020
2020

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 63 publications
(39 citation statements)
references
References 16 publications
0
33
0
6
Order By: Relevance
“…Since,OpenCL does not support recursion so implementation of recursive function is done in host program which calls OpenCL kernel recursivelyThis Kleene's based parallel recursive algorithm shows a significant speedup over OpenCL parallel Floyd Warshall's algorithm over same GPU. [8] It is a blocked organization of Floyd Warshall's all pairs shortest paths algorithm to make better utilization of cache.Several models for computer with different organization of memory has been developed although L2 cache is architecture dependent.La Marca and Ladner develop a model for single level direct mapped cache.They used this model to analyze the performance of binary heaps and cache aligned dheaps and optimized the cache performance for several sorting methods.Authors obtained a lower bound for the L1 and L2 cache miss rate by determining the minimum number of cache misses and making the reasonable assumption that cache optimization will not decrease the total memory references.…”
Section: International Journal Of Computer Applications (0975 -8887) mentioning
confidence: 99%
“…Since,OpenCL does not support recursion so implementation of recursive function is done in host program which calls OpenCL kernel recursivelyThis Kleene's based parallel recursive algorithm shows a significant speedup over OpenCL parallel Floyd Warshall's algorithm over same GPU. [8] It is a blocked organization of Floyd Warshall's all pairs shortest paths algorithm to make better utilization of cache.Several models for computer with different organization of memory has been developed although L2 cache is architecture dependent.La Marca and Ladner develop a model for single level direct mapped cache.They used this model to analyze the performance of binary heaps and cache aligned dheaps and optimized the cache performance for several sorting methods.Authors obtained a lower bound for the L1 and L2 cache miss rate by determining the minimum number of cache misses and making the reasonable assumption that cache optimization will not decrease the total memory references.…”
Section: International Journal Of Computer Applications (0975 -8887) mentioning
confidence: 99%
“…Our work is similar to the work by Venkataraman et. al [17] but unlike their work we have proposed OpenCL based implementation involving high level of parallelism, data reuse that fully exploits architectural benefits of GPU as a low cost computational resource.…”
Section: Problem Time Complexitymentioning
confidence: 99%
“…Unlike BFS, Floyd-Warshall's algorithm (FW) [18], [19] has O(V 3 ) time complexity, which is irrelevant to the graph sparsity. Blocked FW algorithm [20] is an improved version of FW algorithm. Not only is it more efficient than the basic FW algorithm, it is also more suitable for GPU implementation.…”
Section: E All-pair Shortest Pathsmentioning
confidence: 99%
“…The whole adjacency matrix is first converted to the cost matrix C, where each element C ij represents the path-length of a voxel-pair (i, j). Then the cost matrix is divided into r sub-blocks of equal size [20]. The outer loop iterates over the r primary blocks (the blocks along the diagonal of the matrix).…”
Section: ) Bfs On Cpumentioning
confidence: 99%