Workload characterization for the design of future servers

Maron, B.; Chen, T.; Vianney, Duc J.; Olszewski, B.; Kunkel, S.; Mericas, Alex E.

doi:10.1109/iiswc.2005.1526009

Cited by 8 publications

(11 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are many previous studies on the CPU and memory behavior of commercial and high performance computing applications [9,4,14,13,12]. Our work differs from these in the type analysis we are conducting and the angle from which we are viewing the performance data.…”

Section: Related Workmentioning

confidence: 86%

“…We analyze the CPU/memory performance using a multi-granularity Cycles Per Instruction (CPI) model [12]. This model highlights the critical resources that are underutilized by these benchmarks as well as it shows where each application is spending the processing cycles as it passes through the multi-stage, multi-unit complex processor core.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Workload Performance Characterization of DARPA HPCS Benchmarks

Seelam

Chung

Cong

et al. 2008

2008 10th IEEE International Conference on High Performance Computing and Communications

View full text Add to dashboard Cite

SUMMARYIt is critical to understand the workload characteristics and resource usage patterns of available applications to guide the design and development of hardware and software stacks of future machines. In this article, we analyze the workload performance characteristics of three large-scale DARPA HPCS benchmarks: Hybrid Coordinate Ocean Model, Parallel Ocean Program, and Lattice Boltzemann MagnetoHydrodynamics Code while executing on IBM Power5+ processor machines. Our analysis is focused on the CPU/memory performance using Cycles Per Instruction (CPI) model and multiprocess communication performance using MPI traces. For each benchmark, we provide a high-level performance analysis followed by the hotspot analysis for selected input parameters. Then we present a detailed workload performance characterization using CPI model with data from a unique set of performance counters available on the Power5+ processor system. From communication performance analysis, we describe the sources of load imbalances in the applications and identify the potential impediments to the scalability of the applications under large processor counts. We identify several sources of performance problems that are potential bottlenecks and discuss methods to ameliorate them. We also present a comparative analysis of these benchmarks to summarize the similarities and differences in their performance characteristics.

show abstract

Section: Related Workmentioning

confidence: 86%

Section: Introductionmentioning

confidence: 99%

Workload Performance Characterization of DARPA HPCS Benchmarks

Seelam

Chung

Cong

et al. 2008

2008 10th IEEE International Conference on High Performance Computing and Communications

View full text Add to dashboard Cite

show abstract

“…The Power5+ upgrade to the Power5 chip is a speculative, out-of-order execution core with simultaneous multithreading (SMT) and deep multi-stage pipeline structure [2]. It has a dedicated Performance Monitoring Unit (PMU) that can count up to six events.…”

Section: Multi-granularity Cpi Breakdown Model Of Power5+ Processormentioning

confidence: 99%

Workload performance characterization of DARPA HPCS benchmarks

Seelam

Chung

Cong

et al. 2009

Concurrency and Computation

View full text Add to dashboard Cite

It is critical to understand the workload characteristics and resource usage patterns of available applications to guide the design and development of hardware and software stacks of future machines. In this article, we analyze the workload performance characteristics of three large-scale DARPA HPCS benchmarks: Hybrid Coordinate Ocean Model, Parallel Ocean Program, and Lattice Boltzemann Magneto-Hydrodynamics Code while executing on IBM Power5+ processor machines. Our analysis is focused on the CPU/memory performance using Cycles Per Instruction (CPI) model and multiprocess communication performance using MPI traces. For each benchmark, we provide a high-level performance analysis followed by the hotspot analysis for selected input parameters. Then we present a detailed workload performance characterization using CPI model with data from a unique set of performance counters available on the Power5+ processor system. From communication performance analysis, we describe the sources of load imbalances in the applications and identify the potential impediments to the scalability of the applications under large processor counts. We identify several sources of performance problems that are potential bottlenecks and discuss methods to ameliorate them. We also present a comparative analysis of these benchmarks to summarize the similarities and differences in their performance characteristics.

show abstract

“…We chose the two latest Java server benchmarks (SPECjappserver2004 [20] and SPECjbb2005 [18]) for our investigation. Most of the characterization studies [3,8,13,14,15] using SPECjbb or SPECjAppServer benchmarks are somewhat outdated since they use older versions (like SPECjbb2000 [17]) or other benchmarks (like SPECjvm98 [19]). In addition, the platforms that the workloads were characterized on were not CMP-based.…”

Section: Introductionmentioning

confidence: 99%

Addressing Cache/Memory Overheads in Enterprise Java CMP Servers

Shiv

Iyer

Bhat

et al. 2007

2007 IEEE 10th International Symposium on Workload Characterization

View full text Add to dashboard Cite

As we enter the era of chip multiprocessor (CMP) architectures, it is important that we explore the scaling characteristics of mainstream server workloads on these platforms. In this paper, we analyze the performance of two significant Enterprise Java workloads (SPECjAppServer2004 and SPECjbb2005) on CMP platforms -present and future. We start by characterizing the core, cache and memory behavior of these workloads on the newly released Intel Core 2 Duo Xeon platform (dual-core, dual-socket). Our findings from these measurements indicate that these workloads have a significant performance dependence on cache and memory subsystems. In order to guide the evolution of future CMP platforms, we perform a detailed investigation of potential cache and memory architecture choices. This includes analyzing the effects of thread sharing and migration, object allocation and garbage collection. Based on the observed behavior, we propose architectural optimizations along three dimensions: (a) data-less cache line initialization (DCLI), (b) hardware-guided thread collocation (HGTC) and (c) on-socket DRAM caches (OSDC). In this paper, we will describe these optimizations in detail and validate their performance potential based on trace-driven simulations and executiondriven emulation. Overall, we expect that the findings in this paper will guide future CMP architectures for Enterprise Java servers.

show abstract

Workload characterization for the design of future servers

Cited by 8 publications

References 8 publications

Workload Performance Characterization of DARPA HPCS Benchmarks

Workload Performance Characterization of DARPA HPCS Benchmarks

Workload performance characterization of DARPA HPCS benchmarks

Addressing Cache/Memory Overheads in Enterprise Java CMP Servers

Contact Info

Product

Resources

About