Summary
Multicore NUMA systems present on‐board memory hierarchies and communication networks that influence performance when executing shared memory parallel codes. Characterizing this influence is complex, and understanding the effect of particular hardware configurations on different codes is of paramount importance. In this article, monitoring information extracted from hardware counters at runtime is used to characterize the behavior of each thread for an arbitrary number of multithreaded processes running in a multiprocessing environment. This characterization is given in terms of number of operations per second, operational intensity, and latency of memory accesses. We propose a runtime tool, executed in user space, that uses this information to guide two different thread migration strategies for improving execution efficiency by increasing locality and affinity without requiring any modification in the running codes. Different configurations of NAS Parallel OpenMP benchmarks running concurrently on multicore NUMA systems were used to validate the benefits of our proposal, in which up to four processes are running simultaneously. In more than the 95% of the executions of our tool, results outperform those of the operating system (OS) and produces up to 38% improvement in execution time over the OS for heterogeneous workloads, under different and realistic locality and affinity scenarios.
In this work, we introduce a heterogeneous scheme for computing iterative (or time‐step) methods based on finite differences, using an image denoising problem as case study. The idea of this proposal is to dynamically split the domain of the problem into smaller regions based on the CPU and GPU performance, balancing the workload between them. Results show that this approach improves the execution times compared with only use GPU, which is typically faster than CPU in this kind of problems. In our experiments, performance improvements go from 3%, in scenarios where CPU can only handle a little portion of workload, to more than 30%, when CPU can assume more
work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.