ML accelerators have a broad spectrum of usecases that pose different requirements on accelerator design for latency, energy, and area. In the case of systolic array-based ML accelerators, this puts different constraints on Processing Element (PE) array dimensions and SRAM buffer sizes. 3D integration packs more compute or memory in the same 2D footprint, which can be utilized to build more powerful or energyefficient accelerators. However, 3D also expands the design space of ML accelerators by additionally including different possible ways of partitioning the PE array and SRAM buffers among the vertical tiers. Moreover, the partitioning approach may also have different thermal implications. This work provides a systematic framework for performing system-level design space exploration of 3D systolic accelerators. Using this framework, different 3D-partitioned accelerator configurations are proposed and evaluated. The 3D-stacked accelerator designs are modeled using hybrid wafer bonding technique with a 1.44 µm pitch of 3D connection. Results show that different partitioning of the systolic array and SRAM buffers in a 4-tier 3D configuration can lead to either 1.1-3.9X latency reduction or 1-3X energy reduction compared to the baseline design of the same 2D area footprint. It is also shown that by carefully organizing the systolic array and SRAM tiers using logic over memory, the temperature rise with 3D across benchmarks can be limited to 6°C.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.