IEEE International Symposium on Workload Characterization (IISWC'10) 2010
DOI: 10.1109/iiswc.2010.5650174
|View full text |Cite
|
Sign up to set email alerts
|

Performance of multi-process and multi-thread processing on multi-core SMT processors

Abstract: Many modern high-performance processors support multiple hardware threads in the form of multiple cores and SMT (Simultaneous Multi-Threading). Hence achieving good performance scalability of programs with respect to the numbers of cores (core scalability) and SMT threads in one core (SMT scalability) is critical. To identify a way to achieve higher performance on the multi-core SMT processors, this paper compares the performance scalability with two parallelization models (using multiple processes and using m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 11 publications
0
4
0
Order By: Relevance
“…Although the widely available multi-core processors are mainly based on a cache-coherent NUMA design, they differ in the way they have implemented multi-threading to exploit instruction-level and thread-level parallelism. These differences are not only in the size and speed of the cache, but also in the number of threads that can share resources simultaneously [7], the memory-controller mechanism and the inter-processor connector designs that are employed on and off the chip [8]. After selecting the language, the subsequent selection of a virtual machine and/or operating system, from a wide range of options, increases the complexity of the problem even further.…”
Section: Introductionmentioning
confidence: 99%
“…Although the widely available multi-core processors are mainly based on a cache-coherent NUMA design, they differ in the way they have implemented multi-threading to exploit instruction-level and thread-level parallelism. These differences are not only in the size and speed of the cache, but also in the number of threads that can share resources simultaneously [7], the memory-controller mechanism and the inter-processor connector designs that are employed on and off the chip [8]. After selecting the language, the subsequent selection of a virtual machine and/or operating system, from a wide range of options, increases the complexity of the problem even further.…”
Section: Introductionmentioning
confidence: 99%
“…On the other hand, the type of call functions in graphic card, GPU synchronization with CPU when running is completed, functions which are used to determine time, version and technical specification of graphic card, and number of experiments are important factors, which influence on comparison between running time of CPU and GPU. Unfortunately, many papers haven't mentioned impacting factors on running time of CPU and GPU [17][18][19].…”
Section: -Related Workmentioning
confidence: 99%
“…Chandramowlishwarany, Madduri and Vuduc [10] describe their effort to characterize and tune a fast multipole method (FMM) application on several CMP/CMT systems (Intel Harpertown, AMD Barcelona, and Intel Nehalem). Inoue and Nakatani [11] study the effect of multi-process and multithread programming model on CMP/CMT processor for Java benchmarks and PHP applications. The authors conduct their experiments on Sun Niagara T1 (8 cores, 4 way SMT) and on Intel Nehalem (4 cores, 2 way SMT) and conclude that, while core scalability for the two programming models is comparable, SMT scalability is higher for the multi-thread programming model mainly because of the difference in the number of data TLB misses.…”
Section: Related Workmentioning
confidence: 99%