Python is popular among numeric communities that value it for easy to use number crunching modules like [NumPy], [SciPy], [Dask], [Numba], and many others. These modules often use multi-threading for efficient multi-core parallelism in order to utilize all the available CPU cores. Nevertheless, their threads can interfere with each other leading to overhead and inefficiency if used together in one application. The loss of performance can be prevented if all the multi-threaded parties are coordinated. This paper describes usage of Intel® Threading Building Blocks (Intel® TBB), an open-source cross-platform library for multi-core parallelism [TBB], as the composability layer for Python modules. It helps to unlock additional performance for numeric applications on multi-core systems.
Python is popular among scientific communities that value its simplicity and power, especially as it comes along with numeric libraries such as [NumPy], [SciPy], [Dask], and [Numba]. As CPU core counts keep increasing, these modules can make use of many cores via multi-threading for efficient multi-core parallelism. However, threads can interfere with each other leading to overhead and inefficiency if used together in a single application on machines with a large number of cores. This performance loss can be prevented if all multi-threaded modules are coordinated. This paper continues the work started in [AMala16] by introducing more approaches to coordination for both multithreading and multi-processing cases. In particular, we investigate the use of static settings, limiting the number of simultaneously active [OpenMP] parallel regions, and optional parallelism with Intel® Threading Building Blocks (Intel® [TBB]). We will show how these approaches help to unlock additional performance for numeric applications on multi-core systems.
a b s t r a c tFor more than thirty years the code PRIZMA has been used at RFNC-VNIITF for solving radiation transport problems with the Monte Carlo method. The code models the separate and coupled transport of neutrons, photons, electrons, positrons and ions in one-, two-, and three-dimensional geometry. For criticality calculations the code implements the method of generations with a constant number of fission sites in one generation. Now the code is extending its capabilities for nuclear reactor calculations. The paper describes the current status of the code and gives examples of its application to particle transport in nuclear reactors and other physical facilities.
Abstract-It is well-known that the performance difference between Python and basic C code can be up 200x, but for numerically intensive code another speedup factor of 240x or even greater is possible. The performance comes from software's ability to take advantage of CPU's multiple cores, single instruction multiple data (SIMD) instructions, and high performance caches. The article describes optimizations, included in Intel® Distribution for Python*, aimed to automatically boost performance of numerically intensive code. This paper is intended for Python programmers who want to get the most out of their hardware but do not have time or expertise to re-code their applications using techniques such as native extensions or Cython.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.