Kofler, K.; Grasso, I.; Cosenza, B.; Fahringer, T.: An automatic input-sensitive approach for heterogeneous task partitioning. ABSTRACTUnleashing the full potential of heterogeneous systems, consisting of multi-core CPUs and GPUs, is a challenging task due to the difference in processing capabilities, memory availability, and communication latencies of different computational resources.In this paper we propose a novel approach that automatically optimizes task partitioning for different (input) problem sizes and different heterogeneous multi-core architectures. We use the Insieme source-to-source compiler to translate a single-device OpenCL program into a multi-device OpenCL program. The Insieme Runtime System then performs dynamic task partitioning based on an offline-generated prediction model. In order to derive the prediction model, we use a machine learning approach based on Artificial Neural Networks (ANN) that incorporates static program features as well as dynamic, input sensitive features. Principal component analysis have been used to further improve the task partitioning. Our approach has been evaluated over a suite of 23 programs and respectively achieves a performance improvement of 22% and 25% compared to an execution of the benchmarks on a single CPU and a single GPU which is equal to 87.5% of the optimal performance.
Memory optimizations have became increasingly important in order to fully exploit the computational power of modern GPUs. The data arrangement has a big impact on the performance, and it is very hard for GPU programmers to identify a well-suited data layout. Classical data layout transformations include grouping together data fields that have similar access patterns, or transforming Array-of-Structures (AoS) to Structure-of-Arrays (SoA). This paper presents an optimization infrastructure to automatically determine an improved data layout for OpenCL programs written in AoS layout. Our framework consists of two separate algorithms: The first one constructs a graph-based model, which is used to split the AoS input struct into several clusters of fields, based on hardware dependent parameters. The second algorithm selects a good per-cluster data layout (e.g., SoA, AoS or an intermediate layout) using a decision tree. Results show that the combination of both algorithms is able to deliver higher performance than the individual algorithms. The layouts proposed by our framework result in speedups of up to 2.22, 1.89 and 2.83 on an AMD FirePro S9000, NVIDIA GeForce GTX 480 and NVIDIA Tesla k20m, respectively, over different AoS sample programs, and up to 1.18 over a manually optimized program.
N-body simulations represent an important class of numerical simulations in order to study a wide range of physical phenomena for which researchers demand fast and accurate implementations. Due to the computational complexity, simple brute-force methods to solve the longdistance interaction between bodies can only be used for smallscale simulations. Smarter approaches utilize neighbor lists, tree methods or other hierarchical data structures to reduce the complexity of the force calculations. However, such data structures have complex building algorithms which hamper their parallelization for GPUs.In this paper, we introduce a novel method to effectively parallelize N-body simulations for GPU architectures. Our method is based on an efficient, three-phase, parallel Kd-tree building algorithm and a novel volume-mass heuristic to reduce the simulation time and increase accuracy.Experiments demonstrate that our approach is the fastest monopole implementation with an accuracy that is comparable with state of the art implementations (GADGET-2). In particular, we are able to reach a simulation speed of up to 3 Mparticles/s on a single GPU for the force calculation, while still having a relative force error below 0.4% for 99% of the particles. We also show competitive performance with existing GPU implementations, while our competitor shows worse accuracy behavior as well as a higher energy error during time integration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.