Proceedings of the 52nd Annual Design Automation Conference 2015
DOI: 10.1145/2744769.2744803
|View full text |Cite
|
Sign up to set email alerts
|

Bandwidth-efficient on-chip interconnect designs for GPGPUs

Abstract: Modern computational workloads require abundant thread level parallelism (TLP), necessitating highly-parallel, manycore accelerators such as General Purpose Graphics Processing Units (GPGPUs). GPGPUs place a heavy demand on the on-chip interconnect between the many cores and a few memory controllers (MCs). Thus, traffic is highly asymmetric, impacting on-chip resource utilization and system performance. Here, we analyze the communication demands of typical GPGPU applications, and propose efficient Network-on-C… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
31
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 64 publications
(31 citation statements)
references
References 16 publications
0
31
0
Order By: Relevance
“…We used GPGPU-Sim [19] to collect detailed application traces and simulated the network and memory traffic on a customized Noxim NoC simulator [21] that integrates our MACRO-NoC architecture model. We obtained traces for 11 CUDA benchmarks [8], [20], each with different number of kernels and levels of memory intensity: We compared our architecture with two prior works that also propose NoC architectures for GPGPUs: [10] and [11] (both are discussed in the Section 2). The architecture discussed in [10] is called Direct all-to-all (DA2), while that from [11] is called XY-YX.…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…We used GPGPU-Sim [19] to collect detailed application traces and simulated the network and memory traffic on a customized Noxim NoC simulator [21] that integrates our MACRO-NoC architecture model. We obtained traces for 11 CUDA benchmarks [8], [20], each with different number of kernels and levels of memory intensity: We compared our architecture with two prior works that also propose NoC architectures for GPGPUs: [10] and [11] (both are discussed in the Section 2). The architecture discussed in [10] is called Direct all-to-all (DA2), while that from [11] is called XY-YX.…”
Section: Methodsmentioning
confidence: 99%
“…We obtained traces for 11 CUDA benchmarks [8], [20], each with different number of kernels and levels of memory intensity: We compared our architecture with two prior works that also propose NoC architectures for GPGPUs: [10] and [11] (both are discussed in the Section 2). The architecture discussed in [10] is called Direct all-to-all (DA2), while that from [11] is called XY-YX. Figure 8 shows the MC placement we used in our 16-core and 64-core platforms, based on recommendations from [10] and [11] on MC placements.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations