OpenCL is a widely adopted open standard for general purpose programming of diverse heterogeneous parallel platforms that can harness various device types such as CPUs, DSPs, GPUs, FPGAs and hardware accelerators. It is an extensive and explicit low level API serving well as a platform portability layer. However, using OpenCL for diverse heterogeneous programming in multi-vendor platforms is not practical due to device vendors each providing their own OpenCL implementations which do not interoperate efficiently, leading to inefficient execution coordination and collaborative execution between various device types from different vendors.To this end, this paper proposes a vendor-independent open source method for integration of custom FPGA components to a common OpenCL platform. The method relies on a streamlined memory-mapped hardware control interface implemented by the integrated components. The required OpenCL driver integration is then automatically provided, enabling easy inclusion of different types of FPGA accelerators to the control of a single OpenCL runtime.The ease of integration and portability is demonstrated by integrating two hardware devices in two different FPGA devices. The resource overhead of the hardware interface is shown to be negligible and the clock frequency overheads small enough to not pose efficiency challenges.
Hardware specialization is a well-known means to significantly improve the performance and energy efficiency of various application domains. Modern computing systems consist of multiple specialized processing devices which need to collaborate with each other to execute common tasks. New heterogeneous programming abstractions have been created to program heterogeneous systems. Even though many of these abstractions are open vendor-independent standards, cross-vendor interoperability between different implementations is limited since the vendors typically do not have commercial motivations to invest in it. Therefore, getting good performance from vendor-independent heterogeneous programming standards has proven difficult for systems with multiple different device types. In order to help unify the field of heterogeneous programming APIs for platforms with hardware accelerators from multiple vendors, we propose a new software abstraction for hardware-accelerated tasks based on the open OpenCL programming standard. In the proposed abstraction, we rely on the built-in kernel feature of the OpenCL specification to define a portability layer that stores enough information for automated accelerator utilization. This enables the portability of high-level applications to a diverse set of accelerator devices with minimal programmer effort. The abstraction enables a layered software architecture that provides for an efficient combination of application phases to a single asynchronous application description from multiple domain-specific input languages. As proofs of the abstraction layer serving its purpose for the layers above and below it, we show how a domain-specific input description ONNX can be implemented on top of this portability abstraction, and how it also allows driving fixed function and FPGA-based hardware accelerators below in the hardware-specific backend. We also provide an example implementation of the abstraction to show that the abstraction layer does not seem to incur significant execution time overhead.
Modern High Level Synthesis (HLS) tools succeed well in their engineering productivity goal, but still require toolset and target technology specific modifications to the source code to guide the process towards an efficient implementation. Furthermore, their end result is a fixed function accelerator with limited field and runtime flexibility. In this paper we describe the status of AEx, a novel work-in-progress HLS tool developed in the FitOptiVis ECSEL JU project. AEx is based on automated exploration of architectures using a flexible and lightweight parallel co-processor template. We compare its current performance in CHStone C-language benchmarks to the state of the art FPGA HLS tool Vitis, provide ASIC implementation numbers, and identify the main remaining toolset features that are expected to dramatically further improve the performance. The potential is explored with a hand-optimized case study that shows only 1.64x performance slowdown with the programmable co-processor in comparison to the fixed function Vitis HLS result.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.