Accelerating SSSP for Power-Law Graphs

Chi, Yuze; Guo, Licheng; Cong, Jason

doi:10.1145/3490422.3502358

Cited by 18 publications

(5 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Unlike the eager algorithm, there are no parameters that need to be adjusted, as the removals are performed according to the distances and costs of neighbors. Recent works such as those presented in [17,30,31] validated the effectiveness of that approach in achieving efficient parallelism by partially or fully using the proposed method.…”

Section: Shortest-path Problemmentioning

confidence: 83%

Analysis and Construction of Hardware Accelerators for Calculating the Shortest Path in Real-Time Robot Route Planning

Esteves,

Oliveira,

Farias

2024

Electronics

View full text Add to dashboard Cite

This study introduces an optimization approach for calculating the shortest path in mobile robot route planning. The proposed solution targets real-time processing requirements by offering a high-performance alternative. This is achieved by embedding in the dedicated hardware an architecture which emphasizes parallelism. Through improvements in parallel exploration techniques, our solution aims to present not only a boost in performance but also a dynamic adaptation to graph changes, accommodating randomly occurring edge insertions or deletions as environmental conditions fluctuate. We present the developed architecture alongside its results. Our method efficiently updates obstacle matrices, resulting in a remarkable 120-fold improvement for 1024-node graphs. When utilizing a cost-effective device like the Cyclone IV E, it achieves approximately 12 times the performance of software applications.

show abstract

Section: Shortest-path Problemmentioning

confidence: 83%

Analysis and Construction of Hardware Accelerators for Calculating the Shortest Path in Real-Time Robot Route Planning

Esteves,

Oliveira,

Farias

2024

Electronics

View full text Add to dashboard Cite

show abstract

“…In our case, this would require t ≈ 4.3 • 10 12 operations and even by exploiting parallelization, it would be prohibitively expensive. The best algorithm for parallel single source shortest paths for powerlaw graphs similar to ours only gives a speedup of 2.4x on a 32-thread CPU [53]. Even with such efficient algorithms, we would require data processing in the order of Teraflops.…”

Section: A Alternative Graph Metricsmentioning

confidence: 96%

A Graph-Based Stratified Sampling Methodology for the Analysis of (Underground) Forums

Tizio

Siu

Hutchings

et al. 2023

IEEE Trans.Inform.Forensic Secur.

View full text Add to dashboard Cite

Researchers analyze underground forums to study abuse and cybercrime activities. Due to the size of the forums and the domain expertise required to identify criminal discussions, most approaches employ supervised machine learning techniques to automatically classify the posts of interest.[Goal] Human annotation is costly. How to select samples to annotate that account for the structure of the forum? [Method] We present a methodology to generate stratified samples based on information about the centrality properties of the population and evaluate classifier performance. [Result] We observe that by employing a sample obtained from a uniform distribution of the post degree centrality metric, we maintain the same level of precision but significantly increase the recall (+30%) compared to a sample whose distribution is respecting the population stratification. We find that classifiers trained with similar samples disagree on the classification of criminal activities up to 33% of the time when deployed on the entire forum.

show abstract

“…As for HBM-specific optimizations, [22] presents an HLS design that applies a similar floorplanning strategy and achieves a design frequency of 237 MHz when using 18 HBM channels. The majority of the recent HBM-based accelerators [19], [20], [23] are HLS designs but are not able to get more than 190 MHz and use more than 28 HBM channels. [21] implements a hash join accelerator that uses 32 HBM channels while running at 250 MHz and its random memory accesses are through 256-bit wide AXI interfaces.…”

Section: Related Workmentioning

confidence: 99%

“…Finally, the HBM stacks are physically connected to a datacenter FPGA's bottom die only, making it difficult to spread the resource utilization across multiple FPGA dies to achieve desirable timing closure. To the best of our knowledge, none of the published HBM-based accelerator designs [19], [20], [21], [22], [23] is able to fully utilize the entire bandwidth of the 32 HBM channels.…”

Section: Introductionmentioning

confidence: 99%

TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-based FPGAs

Qiao¹,

Guo²,

Fang³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

The emergence of high-bandwidth memory (HBM) brings new opportunities to boost the performance of sorting acceleration on FPGAs, which was conventionally bounded by the available off-chip memory bandwidth. However, it is nontrivial for designers to fully utilize this immense bandwidth. First, the existing sorter designs cannot be directly scaled at the increasing rate of available off-chip bandwidth, as the required on-chip resource usage grows at a much faster rate and would bound the sorting performance in turn. Second, designers need an in-depth understanding of HBM's characteristics to effectively utilize the HBM bandwidth. To tackle these challenges, we present TopSort, a novel two-phase sorting solution optimized for HBM-based FPGAs. In the first phase, 16 merge trees work in parallel to fully utilize 32 HBM channels' bandwidth. In the second phase, TopSort reuses the logic from phase one to form a wider merge tree to merge the partially sorted results from phase one. TopSort also adopts HBM-specific optimizations to reduce resource overhead and improve bandwidth utilization. TopSort can sort up to 4 GB data using all 32 HBM channels, with an overall sorting performance of 15.6 GB/s. TopSort is 6.7× and 2.2× faster than state-of-the-art CPU and FPGA sorters.

show abstract

Accelerating SSSP for Power-Law Graphs

Cited by 18 publications

References 38 publications

Analysis and Construction of Hardware Accelerators for Calculating the Shortest Path in Real-Time Robot Route Planning

Analysis and Construction of Hardware Accelerators for Calculating the Shortest Path in Real-Time Robot Route Planning

A Graph-Based Stratified Sampling Methodology for the Analysis of (Underground) Forums

TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-based FPGAs

Contact Info

Product

Resources

About