Thermal characterization of cloud workloads on a power-efficient server-on-chip

Milojevic, Dragomir; Idgunji, Sachin; Jevdjic, Djordje; Özer, Emre; Lotfi-Kamran, Pejman; Panteli, Andreas; Prodromou, Andreas; Nicopoulos, Chrysostomos; Hardy, Damien; Falsari, Babak; Sazeides, Yiannakis

doi:10.1109/iccd.2012.6378637

Cited by 20 publications

(6 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the purposes of this case study, the assumed baseline server-on-chip architecture is based on the scale-out processor [15] for a single-pod chip [17]. The architecture of a pod contains 16 processing tiles interconnected using a 4×4 mesh with 16 six-port routers.…”

Section: Experimental Methodologymentioning

confidence: 99%

Toward Multi-Layer Holistic Evaluation of System Designs

Kleanthous

Sazeides

Özer

et al. 2016

IEEE Comput. Arch. Lett.

Self Cite

View full text Add to dashboard Cite

The common practice for quantifying the benefit(s) of design-time architectural choices of server processors is often limited to the chip-or server-level. This quantification process invariably entails the use of salient metrics, such as performance, power, and reliability, which capture -in a tangible manner -a designs overall ramifications. This paper argues for the necessity of a more holistic evaluation approach, which considers metrics across multiple integration levels (chip, server and datacenter). In order to facilitate said comprehensive evaluation, we utilize an aggregate metric, e.g. the Total Cost of Ownership (TCO), to harness the complexly of comparing multiple metrics at multiple levels. We motivate our proposition for holistic evaluation with a case study that compares a 2D processor to a 3D processor at various design integration levels. We show that while a 2D processor is clearly the best choice at the processor level, the conclusion is reversed at the data-center level, where the 3D processor becomes a better choice. This result emanates mainly from the performance benefits of processor-DRAM 3D integration, and the ability to amortize (at the datacenter-level) the higher 3D per-server cost and lower reliability by requiring fewer 3D servers to match the same performance.

show abstract

Section: Experimental Methodologymentioning

confidence: 99%

Toward Multi-Layer Holistic Evaluation of System Designs

Kleanthous

Sazeides

Özer

et al. 2016

IEEE Comput. Arch. Lett.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The normal operating temperature for HBM2 DRAM dies is 105 • C [68], and we conservatively assume the DRAM dies in our case operates under 85 • C. A prior study on 3D PIM thermal analysis [78] shows that active cooling solutions can effectively satisfy this thermal constraint (85 • C). Both commodityserver active cooling solution [46] (peak power density allowed: 706mW/mm 2 ) and high-end-server active cooling solution [20] (peak power density allowed: 1214mW/mm 2 )) can be used.…”

Section: B Performance Area Energy and Thermal Analysismentioning

confidence: 99%

MPU: Towards Bandwidth-abundant SIMT Processor via Near-bank Computing

Xie

Ding

et al. 2021

Preprint

View full text Add to dashboard Cite

With the growing number of data-intensive workloads, GPU, which is the state-of-the-art single-instructionmultiple-thread (SIMT) processor, is hindered by the memory bandwidth wall. To alleviate this bottleneck, previously proposed 3D-stacking near-bank computing accelerators benefit from abundant bank-internal bandwidth by bringing computations closer to the DRAM banks. However, these accelerators are specialized for certain application domains with simple architecture data paths and customized software mapping schemes. For general purpose scenarios, lightweight hardware designs for diverse data paths, architectural supports for the SIMT programming model, and end-to-end software optimizations remain challenging.To address these issues, we propose MPU (Memory-centric Processing Unit), the first SIMT processor based on 3D-stacking near-bank computing architecture. First, to realize diverse data paths with small overheads while leveraging bank-level bandwidth, MPU adopts a hybrid pipeline with the capability of offloading instructions to near-bank compute-logic. Second, we explore two architectural supports for the SIMT programming model, including a near-bank shared memory design and a multiple activated row-buffers enhancement. Third, we present an end-to-end compilation flow for MPU to support CUDA programs. To fully utilize MPU's hybrid pipeline, we develop a backend optimization for the instruction offloading decision. The evaluation results of MPU demonstrate 3.46× speedup and 2.57× energy reduction compared with an NVIDIA Tesla V100 GPU on a set of representative data-intensive workloads.

show abstract

“…Therefore, our study focuses on the HMC-style design. Most previous work focuses on investigating the thermal feasibility of implementing PIMs in memory stack [13,55,42,49]. However, when the memory stack is integrated with the host CPU in a single package; the thermal feasibility of the processor-memory PIM system remains largely unexplored.…”

Section: Thermal Issues With Processing In Die-stacking In-package Mementioning

confidence: 99%

Integrated Thermal Analysis for Processing In Die-Stacking Memory

Zhu

Wang

Liu

et al. 2016

Proceedings of the Second International Symposium on Memory Systems

View full text Add to dashboard Cite

Recent application and technology trends bring a renaissance of the processing-in-memory (PIM), which was envisioned decades ago. In particular, die-stacking and silicon interposer technologies enable the integration of memory, PIMs, and the host CPU in a single chip. Yet the integration substantially increases system power density. This can impose substantial thermal challenges to the feasibility of such systems. In this paper, we comprehensively study the thermal feasibility of integrated systems consisting of the host CPU, die-stacking DRAMs, and various types of PIMs. Compared with most previous thermal studies that only focus on the memory stack, we investigate the thermal distribution of the whole processor-memory system. Furthermore, we examine the feasibility of various cooling solutions and feasible scale of various PIM designs under given thermal and area constraints. Finally, we demonstrate system run-time thermal feasibility by executing two high-performance computing applications with PIM-based systems. Based on our experimental studies, we reveal a set of thermal implications for PIM-based system design and configuration.

show abstract

Thermal characterization of cloud workloads on a power-efficient server-on-chip

Cited by 20 publications

References 20 publications

Toward Multi-Layer Holistic Evaluation of System Designs

Toward Multi-Layer Holistic Evaluation of System Designs

MPU: Towards Bandwidth-abundant SIMT Processor via Near-bank Computing

Integrated Thermal Analysis for Processing In Die-Stacking Memory

Contact Info

Product

Resources

About