Scalable interconnects for reconfigurable spatial architectures

Zhang, Yaqi; Rucker, Alexander; Vilim, Matthew; Prabhakar, Raghu; Hwang, William; Olukotun, Kunle

doi:10.1145/3307650.3322249

Cited by 21 publications

(9 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We evaluate Capstan's performance using a custom cycleaccurate C++ simulator; our simulator models the effects of a hybrid static-dynamic network [75] and uses Ramulator [38] to model DRAM behavior. The simulator is validated against microbenchmarks with known performance characteristics.…”

Section: Discussionmentioning

confidence: 99%

“…However, due to halo exchange, convolution maps poorly to positional dataflow: in a streaming-positional architecture, each tile's accumulation buffer would need eight links (input and output) to neighboring tiles. Although we can map convolution to the shuffle network using Spatial (using 100% of the on-chip shuffle re- sources), using the dynamic network [75] in a non-positional (i.e., out-of-order) mode yields 3.8 times higher performance. Without manual mapping, Capstan still out-performs a CPU and GPU; however, manual mapping is used to compare against SCNN (which uses a similar tiled architecture).…”

Section: Discussionmentioning

confidence: 99%

“…AGs send burst-level (64 B) requests to a global controller, which performs low-level DRAM scheduling (precharge, row open, etc.). Units are connected by a loosely-timed interconnection network with per-link buffering to avoid global synchronicity; it provides vector (512-bit) and scalar (32-bit) links for efficient mapping [75]. Network buffering provides timing flexibility for Capstan's reordered memory accesses, which may be delayed for several cycles to increase available parallelism.…”

Section: Capstan Architectural Parametersmentioning

confidence: 99%

“…Scheduling accesses to avoid structural hazards is complicated by positional dataflow, the typical paradigm for RDAs. In positional dataflow, senders and receivers are synchronized, and loop indices are communicated implicitly via the sequence of data elements [75]. A compute graph can thus be mapped to parallel, pipelined execution units without reordering data elements or sending control information across the network.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Capstan: A Vector RDA for Sparsity

Rucker,

Vilim,

Zhao

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

This paper proposes Capstan: a scalable, parallel-patternsbased, reconfigurable-dataflow accelerator (RDA) for sparse and dense tensor applications. Instead of designing for one application, we start with common sparse data formats, each of which supports multiple applications. Using a declarative programming model, Capstan supports application-independent sparse iteration and memory primitives that can be mapped to vectorized, high-performance hardware. We optimize random-access sparse memories with configurable out-oforder execution to increase SRAM random-access throughput from 32% to 80%.For a variety of sparse applications, Capstan with DDR4 memory is 22× faster than a multi-core CPU baseline, while Capstan with HBM2 memory is 17× faster than an Nvidia V100 GPU. For sparse applications that can be mapped to Plasticine, a recent dense RDA, Capstan is 7.6× to 365× faster and only 13% larger.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Section: Capstan Architectural Parametersmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Capstan: A Vector RDA for Sparsity

Rucker,

Vilim,

Zhao

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Other open research challenges in the design of accelerators include supporting software definable interconnects and logic blocks [317] with run-time reconfiguration and dynamic resource allocation. In contrast to an FPGA, a GPU can switch between threads at run-time and thus can be instantaneously reconfigured to run different tasks.…”

Section: ) Summary Of Acceleratorsmentioning

confidence: 99%

Hardware-Accelerated Platforms and Infrastructures for Network Functions: A Survey of Enabling Technologies and Research Studies

2020

View full text Add to dashboard Cite

In order to facilitate flexible network service virtualization and migration, network functions (NFs) are increasingly executed by software modules as so-called "softwarized NFs" on General-Purpose Computing (GPC) platforms and infrastructures. GPC platforms are not specifically designed to efficiently execute NFs with their typically intense Input/Output (I/O) demands. Recently, numerous hardwarebased accelerations have been developed to augment GPC platforms and infrastructures, e.g., the central processing unit (CPU) and memory, to efficiently execute NFs. This article comprehensively surveys hardware-accelerated platforms and infrastructures for executing softwarized NFs. This survey covers both commercial products, which we consider to be enabling technologies, as well as relevant research studies. We have organized the survey into the main categories of enabling technologies and research studies on hardware accelerations for the CPU, the memory, and the interconnects (e.g., between CPU and memory), as well as custom and dedicated hardware accelerators (that are embedded on the platforms); furthermore, we survey hardware-accelerated infrastructures that connect GPC platforms to networks (e.g., smart network interface cards). We find that the CPU hardware accelerations have mainly focused on extended instruction sets and CPU clock adjustments, as well as cache coherency. Hardware accelerated interconnects have been developed for on-chip and chip-to-chip connections. Our comprehensive up-to-date survey identifies the main trade-offs and limitations of the existing hardware-accelerated platforms and infrastructures for NFs and outlines directions for future research.

show abstract