Providing cost-effective on-chip network bandwidth in GPGPUs

Kim, Hanjoon; Kim, John; Seo, Woong; Cho, Yeongon; Ryu, Soojung

doi:10.1109/iccd.2012.6378671

Cited by 33 publications

(24 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To avoid protocol deadlock, we increase the number of VCs per port, where different types of packets traverse on-chip networks via different VCs. It is noted that additional VCs employed to avoid a protocol deadlock can affect the critical path of a router since VC allocation is the bottleneck in the router pipeline [11]. However, we observe that two separate VCs under a single physical network degrades system performance less than 0.03% in geometric mean across 25 benchmarks.…”

Section: Methodsmentioning

confidence: 82%

“…When the resources are naïvely shared by both packets, avoiding protocol deadlock requires that reply packets must not compete for the same resources as request packets. To avoid this, prior studies [4,3,11] suggest partitioning NoCs equally into two…”

Section: Vc Monopolizing and Asymmetric Vc Partitioningmentioning

confidence: 95%

“…Although NoC design has matured in this domain [9,14], NoC design for GPGPUs is still in its infancy. Only a handful of works have examined the impact of NoC design in GPGPU systems [3,11,13,15].…”

Section: Introductionmentioning

confidence: 99%

“…Therefore, when we design a bandwidth-efficient NoC, the asymmetry of its onchip traffic must be considered. In prior work [3,4,11], the on-chip network is partitioned into two independent, equally divided (logical or physical) subnetworks between different types of packets to avoid cyclic dependencies that might cause protocol deadlocks. Due to the asymmetric traffic in GPGPUs skewed heavily towards reply packets, however, such partitioning can lead to imbalanced use of NoC resources given in each subnetwork.…”

Section: Introductionmentioning

confidence: 99%

“…The advent of parallel programming models, such as CUDA and OpenCL, makes it easier to program graphics/non-graphics applications, making GPGPUs an excellent computing platform. The growing quantity of parallelism and the fast scaling of GPGPUs have fueled an increasing demand for performance-efficient on-chip fabrics finely tuned for GPGPU cores and memory systems [3,11].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Bandwidth-efficient on-chip interconnect designs for GPGPUs

Jang¹,

Kim²,

Gratz³

et al. 2015

Proceedings of the 52nd Annual Design Automation Conference

View full text Add to dashboard Cite

Modern computational workloads require abundant thread level parallelism (TLP), necessitating highly-parallel, manycore accelerators such as General Purpose Graphics Processing Units (GPGPUs). GPGPUs place a heavy demand on the on-chip interconnect between the many cores and a few memory controllers (MCs). Thus, traffic is highly asymmetric, impacting on-chip resource utilization and system performance. Here, we analyze the communication demands of typical GPGPU applications, and propose efficient Network-on-Chip (NoC) designs to meet those demands. We show that the proposed schemes improve performance by up to 64.7%. Compared to the best of class prior work, our VC monopolizing and partitioning schemes improve performance by 25%.

show abstract

Section: Methodsmentioning

confidence: 82%

Section: Vc Monopolizing and Asymmetric Vc Partitioningmentioning

confidence: 95%