2008 11th IEEE International Conference on Computational Science and Engineering 2008
DOI: 10.1109/cse.2008.46
|View full text |Cite
|
Sign up to set email alerts
|

Application Performance Tuning for Clusters with ccNUMA Nodes

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2009
2009
2013
2013

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 11 publications
0
4
0
Order By: Relevance
“…Therefore, all benchmarks have been executed bound to a subset of the available CPU sockets, utilizing all cores on these sockets to simulate systems with different numbers of sockets. For LAMA we have done this with numactl and for PETSc we have bound the mpd daemon of MPICH to a socket with taskset to enforce the PETSc processes to only utilize these specific sockets [15,16]. When Using taskset, it has to be taken into account that the numbering of CPU cores is not always as expected [20].…”
Section: Executionmentioning
confidence: 99%
“…Therefore, all benchmarks have been executed bound to a subset of the available CPU sockets, utilizing all cores on these sockets to simulate systems with different numbers of sockets. For LAMA we have done this with numactl and for PETSc we have bound the mpd daemon of MPICH to a socket with taskset to enforce the PETSc processes to only utilize these specific sockets [15,16]. When Using taskset, it has to be taken into account that the numbering of CPU cores is not always as expected [20].…”
Section: Executionmentioning
confidence: 99%
“…In [10] the authors present an study of the performance obtained (with relation to the ccNUMA memory) in a Sun Fire Server. The paper also proposes son performance tunings that improve up to 30% the application performance.…”
Section: Related Workmentioning
confidence: 99%
“…Multi-and many-core processors exhibit even lower latencies for shared data due to on-chip cache space utilization. Earlier studies showed significant performance issues that arise from mis-handling of cache hierarchies in multi-core based systems [1]. Thus, efficient handling of address translation becomes even more crucial as this overhead may easily become the dominant factor in the overall access time for such architectures.…”
Section: Introductionmentioning
confidence: 99%