2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) 2018
DOI: 10.1109/micro.2018.00010
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling

Abstract: Graph processing is increasingly bottlenecked by main memory accesses. On-chip caches are of little help because the irregular structure of graphs causes seemingly random memory references. However, most real-world graphs offer significant potential locality-it is just hard to predict ahead of time. In practice, graphs have well-connected regions where relatively few vertices share edges with many common neighbors. If these vertices were processed together, graph processing would enjoy significant data reuse. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
44
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 108 publications
(45 citation statements)
references
References 49 publications
0
44
0
1
Order By: Relevance
“…Specifically, difference in speed-ups for DBG and Gorder is very small for datasets kr, tw, wl and mp. These datasets have relatively small clustering coefficient compared to other datasets [37], which makes it difficult for Gorder to approximate suitable vertex ordering. On other datasets, Gorder provides significantly higher speed-ups than any skewaware techniques.…”
Section: A Performance Excluding Reordering Timementioning
confidence: 99%
“…Specifically, difference in speed-ups for DBG and Gorder is very small for datasets kr, tw, wl and mp. These datasets have relatively small clustering coefficient compared to other datasets [37], which makes it difficult for Gorder to approximate suitable vertex ordering. On other datasets, Gorder provides significantly higher speed-ups than any skewaware techniques.…”
Section: A Performance Excluding Reordering Timementioning
confidence: 99%
“…Though locality is often present in these workloads [8], standard techniques to reduce data movement struggle. Irregular prefetchers [44,47,96] can hide data access latency, but they do not reduce overall data movement [62]. Moreover, irregular workloads are poorly suited to common accelerator designs [18,65].…”
Section: Data Movement Is a Growing Problemmentioning
confidence: 99%
“…Beyond irregular computations, we believe that Memory Services can accelerate a wide range of tasks, such as background systems (e.g., garbage collection [60], data dedup [86]), cache optimization (e.g., sophisticated cache organizations [77,80,81], specialized prefetchers [6,98,99]), as well as other functionality that is prohibitively expensive in software today (e.g., work scheduling [62], fine-grain memoization [28,102]). We leave these to future work.…”
Section: Introductionmentioning
confidence: 99%
“…Traversal scheduling: Mukkara et al proposed HATS [6], a hardware-accelerator implementing locality-aware scheduling to exploit cache locality for graphs exhibiting community structure. While effective, it requires intrusive hardware changes, including a specialized hardware unit with each core and an ISA change on the host core.…”
Section: Related Workmentioning
confidence: 99%