A Practical Approach for Performance Analysis of Shared-Memory Programs

Tudor, Bogdan Marius; Teo, Yong Meng

doi:10.1109/ipdps.2011.68

Cited by 17 publications

(18 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Using extensive measurement analysis on state of the art multicore systems [8], [9], we conclude that for weak-scaling programs the last-level misses, number of cycles unrelated to memory contention and work cycles do not change significantly when n changes. Furthermore, we study the pattern of burstiness of the memory requests and conclude that large parallel programs do not exhibit bursty memory traffic [9].…”

Section: Model Of Memory Contentionmentioning

confidence: 97%

“…We define ω(n), the memory contention factor, as the number of threads busy due to memory overhead to the number of threads busy due to useful work. With these notations, following the derivations described in [8], the speedup of a shared-memory program is:…”

Section: Parallelism and Energy Performance Models A Model Of Pmentioning

confidence: 99%

“…The average prediction error across more than two hundreds experiments with four problem sizes, and core counts ranging from 2 to 48 is 9% for UMA systems and 14% for NUMA systems. The accuracy of the model is strongly determined by the problem size, with best accuracy for larger problem sizes, due to large steady-state compute phases [8] and uniform memory access patterns [9].…”

Section: E Limitationsmentioning

confidence: 99%

“…The practicality of the models stems from their generality across programming languages/threading packages and architectures, and from new insights on the decrease of memory burstiness in multicore systems, for large problem sizes. These insights are based on extensive experiments on state-of-the-art multicore systems, with core counts up to 4 in ARM systems, and 48 in commodity UMA and NUMA systems [8], [9]. 2) Using predictions of the model, we optimize execution of parallel programs by adjusting the number of cores or the core frequency in systems with dynamic frequency scaling.…”

Section: Introductionmentioning

confidence: 99%

“…2) Using predictions of the model, we optimize execution of parallel programs by adjusting the number of cores or the core frequency in systems with dynamic frequency scaling. The optimality criteria ranges from fastest execution time to minimum energy usage, and trade-offs between these two [8].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Towards Modelling Parallelism and Energy Performance of Multicore Systems

Tudor

Teo

2012

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops &Amp; PhD Forum

Self Cite

View full text Add to dashboard Cite

Abstract-Multicore systems are increasingly adopted across many application domains. Consequently, understanding their performance is becoming an important issue for a growing number of users. However, performance analysis of parallel programs on multicore systems is still challenging, especially for large programs or applications developed in multiple programming languages. This paper proposes an analytical modelling approach for studying the parallelism and energy performance of shared-memory programs on multicore systems. The proposed model derives the speedup and speedup loss from data dependency and memory overhead in traditional UMA and NUMA multicore systems, and emerging platforms such as ARM multicores. Using only widely available inputs derived from the trace of the operating system run-queue and hardware events counters, the proposed model achieves high practicality and generality across many types of sharedmemory programs running on different multicore platforms. Applications of the model include understanding achieved speedup and parallelism loss, and prediction of optimal core and memory configuration, where the optimality criteria is minimum execution time, minimum energy usage or a tradeoff between these two.

show abstract

Section: Model Of Memory Contentionmentioning

confidence: 97%

Section: Parallelism and Energy Performance Models A Model Of Pmentioning

confidence: 99%