2015
DOI: 10.1016/j.anucene.2014.08.038
|View full text |Cite
|
Sign up to set email alerts
|

Memory bottlenecks and memory contention in multi-core Monte Carlo transport codes

Abstract: a b s t r a c tWe have extracted a kernel that executes only the most computationally expensive steps of the Monte Carlo particle transport algorithm -the calculation of macroscopic cross sections -in an effort to expose bottlenecks within multi-core, shared memory architectures.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
11
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 23 publications
(13 citation statements)
references
References 12 publications
2
11
0
Order By: Relevance
“…Significant speedup of simulation performance can be easily achieved by enabling OpenMP in MC codes but scaling degradation at high core counts is attributed to complex both hardware and software factors, such as memory bottlenecks (Tramm and Siegel, 2013;.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Significant speedup of simulation performance can be easily achieved by enabling OpenMP in MC codes but scaling degradation at high core counts is attributed to complex both hardware and software factors, such as memory bottlenecks (Tramm and Siegel, 2013;.…”
Section: Related Workmentioning
confidence: 99%
“…Recent efforts to parallelize MC codes also include leveraging multi-core architecture (Tramm and Siegel, 2013; Siegel et al, 2014) and graphics processing units (GPUs) (Boyd et al, 2013). Significant speedup of simulation performance can be easily achieved by enabling OpenMP in MC codes but scaling degradation at high core counts is attributed to complex both hardware and software factors, such as memory bottlenecks (Tramm and Siegel, 2013; Siegel et al, 2014).…”
Section: Related Workmentioning
confidence: 99%
“…On one hand, Doppler Broadening introduces compute-intensive FLOP work between frequent memory loads to mitigate the latency-bound bottleneck mainly induced by the binary search in the pre-tabulated cross section approach [10]. On the other hand, temperature dependent cross section data are computed on-thefly whenever they are requested, which saves significantly the memory footprint of program.…”
Section: Benchmarkmentioning
confidence: 99%
“…Parallel algorithms for Monte Carlo methods on distributed memory systems have been a vibrant area of research for many decades, including recent advances in distributed fission banks (Romano and Forget, 2013) and spatial domain decomposition (Horelik et al, 2014). However, studies in on-node parallelism have pointed to some key issues—scaling limitations due to memory contention (Siegel et al, 2014; Tramm and Siegel, 2013) and the difficulty of formulating Monte Carlo approaches with SIMD parallelism (Nelson, 2009).…”
Section: Introductionmentioning
confidence: 99%