2011
DOI: 10.1145/2019583.2019584
|View full text |Cite
|
Sign up to set email alerts
|

Spatial hardware implementation for sparse graph algorithms in GraphStep

Abstract: How do we develop programs that are easy to express, easy to reason about, and able to achieve high performance on massively parallel machines? To address this problem, we introduce GraphStep, a domain-specific compute model that captures algorithms that act on static, irregular, sparse graphs. In GraphStep, algorithms are expressed directly without requiring the programmer to explicitly manage parallel synchronization, operation ordering, placement, or scheduling details. Problems in the sparse graph domain a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2012
2012
2016
2016

Publication Types

Select...
4
3
1

Relationship

3
5

Authors

Journals

citations
Cited by 18 publications
(13 citation statements)
references
References 26 publications
0
13
0
Order By: Relevance
“…Similarly, conventional processors perform poorly on irregular graph operations. The GraphStep architecture showed how to organize active computations around embedded memories in the FPGA to accelerate graph processing [88].…”
Section: E Integrated Memorymentioning
confidence: 99%
“…Similarly, conventional processors perform poorly on irregular graph operations. The GraphStep architecture showed how to organize active computations around embedded memories in the FPGA to accelerate graph processing [88].…”
Section: E Integrated Memorymentioning
confidence: 99%
“…The phenomenon here is closely related to the ones explored in DeHon [2015], and we similarly show that it is often better to distribute the data and computation than to centralize it in a single memory. GraphStep [Delorimier et al 2011] provides one concrete model for how applications might be defined to allow this form of parallelism tuning. In the remainder of this section, we explain and model the opposing communication energy effects and show how they give rise to this optimum energy point.…”
Section: Parallelism and Data Movement Energymentioning
confidence: 99%
“…The application traffic is the complete set of communication messages between nodes during Bellman-Ford shortest path computations mapped onto finite number of NoC PEs. The original computation is based on a Barrier Synchronized Parallel model that divides computation into separate steps [3]. Since the overall run time for the whole computation depends on each step period, the maximum number of cycles to route a single step is an important metric for performance evaluation.…”
Section: A Experimental Setupmentioning
confidence: 99%
“…multiprocessors [1], CoRAM [2], sparse graph processing [3], dynamic reconfigurable accelerators [4]). Natively, today's FPGAs provide high dedicated bandwidth with configured interconnect, but only modest dynamically shared bandwidth with hardwired buses [5].…”
Section: Introductionmentioning
confidence: 99%