Evaluation of OpenMP for the Cyclops Multithreaded Architecture

Almási, George; Ayguadé, Eduard; Caşcaval, Călin; Castaños, José G.; Labarta, Jesús; Martínez, Francisco Javier Zaragoza; Martorell, Xavier; Moreira, José E.

doi:10.1007/3-540-45009-2_6

Cited by 5 publications

(7 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The reported values are for 2 threads per processor and have been normalized with respect to the single-thread per processor execution of each benchmark on the specific architecture and number of processors. This way, the graphs emphasize the effects of using a second 1 All the NAS applications we used are iterative. The computational routines are enclosed in an external, sequential loop.…”

Section: Resultsmentioning

confidence: 98%

“…Earlier research efforts have ported and evaluated OpenMP on specific processor designs, including heterogeneous chip multiprocessors [14], slipstream processors [9] (a form of 2-way chip multiprocessors in which the second core is used for speculative runahead execution) and Cyclops, a fine-grain multithreaded processor architecture introduced by IBM [1]. Our evaluation focuses on commodity processors, with organizations spanning the design space between simultaneous multithreading and chip multiprocessors and a few execution contexts.…”

Section: Related Workmentioning

confidence: 99%

“…Full system simulation with Simics introduces an average 7000-fold slowdown in the execution time of applications, compared with the execution on a real machine. We simulated the same application binaries, using the same data sets, however we reduced the number of iterations 1 we ran on the simulator in order to limit the execution time to reasonable levels. More specifically, we executed only 3 of the outermost iterations of each benchmark, discarding the results from the first iteration in order to eliminate transient effects due to cache warmup.…”

Section: Hardware and Software Environment And Configurationmentioning

confidence: 99%

See 2 more Smart Citations

An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors

Curtis-Maury

Ding

Antonopoulos

et al. 2008

OpenMP Shared Memory Parallel Programming

View full text Add to dashboard Cite

Abstract. Multiprocessors based on simultaneous multithreaded (SMT) or multicore (CMP) processors are continuing to gain a significant share in both highperformance and mainstream computing markets. In this paper we evaluate the performance of OpenMP applications on these two parallel architectures. We use detailed hardware metrics to identify architectural bottlenecks. We find that the high level of resource sharing in SMTs results in performance complications, should more than 1 thread be assigned on a single physical processor. CMPs, on the other hand, are an attractive alternative. Our results show that the exploitation of the multiple processor cores on each chip results in significant performance benefits. We evaluate an adaptive, run-time mechanism which provides limited performance improvements on SMTs, however the inherent bottlenecks remain difficult to overcome. We conclude that out-of-the-box OpenMP code scales better on CMPs than SMTs. To maximize the efficiency of OpenMP on SMTs, new capabilities are required by the runtime environment and/or the programming interface.

show abstract

Section: Resultsmentioning

confidence: 98%

Section: Related Workmentioning

confidence: 99%

Section: Hardware and Software Environment And Configurationmentioning

confidence: 99%

See 1 more Smart Citation

An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors

Curtis-Maury

Ding

Antonopoulos

et al. 2008

OpenMP Shared Memory Parallel Programming

View full text Add to dashboard Cite

show abstract

“…In [1] the authors evaluated the performance of some of the NAS benchmarks on Cyclops. The performance metric used to evaluate performance is speedup with respect to the sequential execution.…”

Section: Analyzing the Current Resultsmentioning

confidence: 99%

“…Our tools rely on the Cyclops developer toolkit and simulator to evaluate performance. An initial evaluation of the porting was presented in [1]. This paper proposes two different kinds of optimizations (software and hardware) to run applications on the Cyclops architecture.…”

Section: Introductionmentioning

confidence: 99%

Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture

Rodenas¹,

Martorell²,

Almási³

et al.

19th IEEE International Parallel and Distributed Processing Symposium

View full text Add to dashboard Cite

In this paper, we present two approaches to improve the execution of OpenMP applications on the IBM Cyclops multithreaded architecture. Both solutions are independent and they are focused to obtain better performance through a better management of the cache locality. The first solution is based on software modifications to the OpenMP runtime library to balance stack accesses across all data caches. The second solution is a small hardware modification to change the data cache mapping behavior, with the same goal. Both solutions help parallel applications to improve scalability and obtain better performance in this kind of architectures. In fact, they could also be applied to future multi-core processors. We have executed (using simulation) some of the NAS benchmarks to prove these proposals. They show how, with small changes in both the software and the hardware, we achieve very good scalability in parallel applications. Our results also show that standard execution environments oriented to multiprocessor architectures can be easily adapted to exploit multithreaded processors.

show abstract