The Intel Xeon Phi offers a promising solution to coprocessing, since it is based on the popular x86 instruction set. However, to fully utilize its potential, applications must be vectorized to leverage the wide SIMD lanes, in addition to effective large-scale shared memory parallelism. Compared to the SIMT execution model on GPGPUs with CUDA or OpenCL, SIMD parallelism with a SSE-like instruction set imposes many restrictions, and has generally not benefitted applications involving branches, irregular accesses, or even reductions in the past. In this paper, we consider the problem of accelerating applications involving different communication patterns on Xeon Phis, with an emphasis on effectively using available SIMD parallelism. We offer an API for both shared memory and SIMD parallelization, and demonstrate its implementation. We use implementations of overloaded functions as a mechanism for providing SIMD code, which is assisted by runtime data reordering and our methods to effectively manage control flow. Our extensive evaluation with 6 popular applications shows large gains over the SIMD parallelization achieved by the production (ICC) compiler, and we even outperform OpenMP for MIMD parallelism.
GPUs have rapidly emerged as a very significant player in high performance computing. However, despite the popularity of CUDA, there are significant challenges in porting different classes of HPC applications on modern GPUs. This paper focuses on the challenges of implementing irregular applications arising from unstructured grids on modern NVIDIA GPUs. Considering the importance of irregular reductions in scientific and engineering codes, substantial effort was made in developing compiler and runtime support for parallelization or optimization of these codes in the previous two decades, with different efforts targeting distributed memory machines, distributed shared memory machines, shared memory machines, or cache performance improvement on uniprocessor machines. However, there have not been any systematic studies on parallelizing these applications on modern GPUs.There are at least two significant challenges associated with porting this class of applications on modern GPUs. The first is related to correct and efficient parallelization while using a large number of threads. The second challenge is effective use of shared memory. Since data accesses cannot be determined statically, runtime partitioning methods are needed for effectively using the shared memory. This paper describes an execution methodology that can address the above two challenges. We have also developed optimized runtime modules to support our execution methodology. Our approach and runtime methods have been extensively evaluated using two indirection array based applications.
Graphics processing units (GPUs) have rapidly emerged as a very significant player in high performance computing. Single instruction multiple thread (SIMT) pipelines are typically used in GPUs to exploit parallelism and maximize performance. Although support for unstructured control flow has been included in GPUs, efficiently managing thread divergence for arbitrary parallel programs remains a critical challenge. In this paper, we focus on the problem of supporting recursion in modern GPUs. We design and comparatively evaluate various algorithms to manage thread divergence encountered in recursive programs. The results improve upon traditional post-dominator based reconvergence mechanisms designed to handle thread divergence due to control flow within a procedure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.