A Reconfigurable Computing Approach for Efficient and Scalable Parallel Graph Exploration

Betkaoui, Brahim; Wang, Yu; Thomas, David B.; Luk, Wayne

doi:10.1109/asap.2012.30

Cited by 53 publications

(45 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A considerable amount of research on parallel BFS implementations on GPUs focuses on level-synchronous or fixed-point methods [19,20]. The reconfigurable hardware approach in solving graph traversal problems on clusters of FPGAs is limited by graph size and synthesis times [4,8]. Betkaoui et al (2012) [4] and Attia et al (2014) [8] explored highly parallelized processing elements (PEs) and decoupled computation memory.…”

Section: Related Workmentioning

confidence: 99%

“…However, new parallel computing machines could provide a better platform for software methods. Heterogeneous processing, with reconfigurable logic and field-programmable gate array (FPGAs) as an energy efficient computing systems [7], performs competitively with the multicore CPUs and GPGPUs [4,8]. The performance of breadth-first search (BFS) on large graphs is bound by the access to high-latency external memory.…”

Section: Introductionmentioning

confidence: 99%

“…These applications represent the connections, relations, and interaction among entities, such as social networks [2], biological interactions [3], and ground transportation [1]. Poor data-driven computation, unstructured organization, irregular memory access, and low computations-to-memory ratio are the prime reasons for parallel large-graph processing inefficiency [4]. To traverse larger graphs caused by dataintensive applications, a variety of scientific programming methods has been proposed [5,6].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Scalable Parallel Distributed Coprocessor System for Graph Searching Problems with Massive Data

Huang

Sun

et al. 2017

Scientific Programming

View full text Add to dashboard Cite

The Internet applications, such as network searching, electronic commerce, and modern medical applications, produce and process massive data. Considerable data parallelism exists in computation processes of data-intensive applications. A traversal algorithm, breadth-first search (BFS), is fundamental in many graph processing applications and metrics when a graph grows in scale. A variety of scientific programming methods have been proposed for accelerating and parallelizing BFS because of the poor temporal and spatial locality caused by inherent irregular memory access patterns. However, new parallel hardware could provide better improvement for scientific methods. To address small-world graph problems, we propose a scalable and novel field-programmable gate array-based heterogeneous multicore system for scientific programming. The core is multithread for streaming processing. And the communication network InfiniBand is adopted for scalability. We design a binary search algorithm to address mapping to unify all processor addresses. Within the limits permitted by the Graph500 test bench after 1D parallel hybrid BFS algorithm testing, our 8-core and 8-thread-per-core system achieved superior performance and efficiency compared with the prior work under the same degree of parallelism. Our system is efficient not as a special acceleration unit but as a processor platform that deals with graph searching applications.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Scalable Parallel Distributed Coprocessor System for Graph Searching Problems with Massive Data

Huang

Sun

et al. 2017

Scientific Programming

View full text Add to dashboard Cite

show abstract

“…Set the vertex number to 16 million with an average arity from 8 to 32, we compare the BFS performance on our platform to that of ASAP12-1F, as shown in Table. III. ASAP12-1F, the single FPGA-based implementation of [5], is a state-of-the-art highly optimized BFS implementation on FPGA for large scale graphs, which is capable of obtaining superior performance to that of multi-core CPU platforms. Using a single Virtex5 LX330 FPGA with 32 PEs, our design outperforms ASAP12-1F using the same type of FPGA with 128 PEs by a factor of 1.25x to 1.32x.…”

Section: Implementation and Performance Evaluationmentioning

confidence: 99%

“…The early methods either build a circuit that resembles the graph or use low-latency on-chip memory resources to store the entire graph [1,2,3], but failed to adapt the real world graphs, which are too large to fit into on-chip random-access memory (RAM) of FPGAs. Several recent publications [4,5] described strategies of graph traversal on FPGAs using off-chip DRAM memories to adapt the traversal of large-scale graph instances, and Betkaoui's work [5] is the first FPGA-based BFS implementation that can compete with other high performance multi-core systems.…”

Section: Introductionmentioning

confidence: 99%

Parallel graph traversal for FPGA

Dou

Zou

et al. 2014

IEICE Electron. Express

View full text Add to dashboard Cite

This paper presents a multi-channel memory based architecture for parallel processing of large-scale graph traversal for fieldprogrammable gate array (FPGA). By designing a multi-channel memory subsystem with two DRAM modules and two SRAM chips and developing an optimized pipelining structure for the processing elements, we achieve superior performance to that of a state-of-the-art highly optimized BFS implementations using the same type of FPGA.

show abstract

Strategic Infrastructural Developments to Reinforce Reconfigurable Computing for Indigenous AI Applications

Khurge¹

2023

Artificial Intelligence Applications and Reconfigurable Architectures

View full text Add to dashboard Cite

A Reconfigurable Computing Approach for Efficient and Scalable Parallel Graph Exploration

Cited by 53 publications

References 14 publications

Scalable Parallel Distributed Coprocessor System for Graph Searching Problems with Massive Data

Scalable Parallel Distributed Coprocessor System for Graph Searching Problems with Massive Data

Parallel graph traversal for FPGA

Strategic Infrastructural Developments to Reinforce Reconfigurable Computing for Indigenous AI Applications

Contact Info

Product

Resources

About