2020
DOI: 10.1007/978-3-030-50743-5_7
|View full text |Cite
|
Sign up to set email alerts
|

Shared-Memory Parallel Probabilistic Graphical Modeling Optimization: Comparison of Threads, OpenMP, and Data-Parallel Primitives

Abstract: This work examines performance characteristics of multiple shared-memory implementations of a probabilistic graphical modeling (PGM) optimization code, which forms the basis for an advanced, stateof-the art image segmentation method. The work is motivated by the need to accelerate scientific image analysis pipelines in use by experimental science, such as at x-ray light sources, and is motivated by the need for platform-portable codes that perform well across many different computational architectures. The pri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…One primary difference between these previous works, except for Perciano, et al, 2020 [17], and our work here is the deeper introspection provided by using detailed hardware performance counters. These additional metrics offer the ability to better understand why a given code performs better or worse in a particular set of circumstances, and also helps to provide a more sound basis for performance analysis.…”
Section: B Comparing Traditional and Vtk-m Implementationsmentioning
confidence: 84%
See 1 more Smart Citation
“…One primary difference between these previous works, except for Perciano, et al, 2020 [17], and our work here is the deeper introspection provided by using detailed hardware performance counters. These additional metrics offer the ability to better understand why a given code performs better or worse in a particular set of circumstances, and also helps to provide a more sound basis for performance analysis.…”
Section: B Comparing Traditional and Vtk-m Implementationsmentioning
confidence: 84%
“…The three cases we present all exhibit different aspects of why a method might have better or worse runtime than another. In some cases, the way an algorithm is implemented, such as VTK vs. VTK-m, can have a dramatic impact on overall number of instructions, a fact that is corroborated by other recent studies (c.f., [17]). In other cases, the buffer management needed to implement a complex, multi-stage processing pipeline may trigger more memory movement instructions, which may be more expensive and result in higher CPI values, and we see evidence of this in two of the examples.…”
Section: F Discussion Of Resultsmentioning
confidence: 99%
“…The emphasis is not merely on parallelization but also on meticulous optimizations. These refinements strategically curtail layer synchronization overheads and mitigate the intricacies tied to race conditions, ensuring the algorithm's robustness [3]. Concurrently, a discerning evaluation measures the algorithm's performance enhancements, specifically gauging the speedup in relation to the count of engaged threads.…”
Section: Introductionmentioning
confidence: 99%