Enhancing the Programmability and Performance Portability of GPU Tensor Operations

Mazaheri, Arya; Schulte, Johannes H.; Moskewicz, Matthew W.; Wolf, Felix; Jannesari, Ali

doi:10.1007/978-3-030-29400-7_16

Cited by 7 publications

(5 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other authors have proposed the idea of unified semantics among different GPGPU APIs targeting OpenCL and CUDA as compilation backends [19], and proposing a framework offering a unified specification with easy-to-use abstractions for managing compute and data resources. Narrowing the application target to tensor operations, other approaches have investigated an abstraction layer for deep neural networks, capable of generating CUDA, OpenCL and Vulkan code [20].…”

Section: Related Workmentioning

confidence: 99%

A Taxonomy of Modern GPGPU Programming Methods: On the Benefits of a Unified Specification

Capodieci

Cavicchioli

Marongiu

2022

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

Several Application Programming Interfaces (APIs) and frameworks have been proposed to simplify the development of General-Purpose GPU (GPGPU) applications. GPGPU application development typically involves specific customization for the target operating systems and hardware devices. The effort to port applications from one API to the other (or to develop multi-target applications) is complicated by the availability of a plethora of specifications, which in essence offers very similar underlying functionality. In this work we provide an in-depth study of six state-of-the-art GPGPU APIs. From these we derive a taxonomy of the common semantics and propose a unified specification. We describe a methodology to translate this unified specification into different target APIs. This simplifies crossplatform application development and provides a clean framework for benchmarking. Our proposed unified specification is called GUST (GPGPU Unified Specification and Translation) and it captures common functionality found in compute-only APIs (e.g., CUDA and OpenCL), in the compute pipeline of traditional graphic-oriented APIs (e.g., OpenGL and Direct3D11) and in last-generation bare-metal APIs (e.g., Vulkan and Direct3D12). The proposed translation methodology solves differences between specific APIs in a transparent manner, without hiding available tuning knobs for compute kernel optimizations and fostering best programming practices in a simple manner.

show abstract

Section: Related Workmentioning

confidence: 99%

A Taxonomy of Modern GPGPU Programming Methods: On the Benefits of a Unified Specification

Capodieci

Cavicchioli

Marongiu

2022

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

show abstract

“…Scalability reflects the ability to support a massive amount of data [202], [203]. Portability refers to the flexibility of the workload to be transportable across core, edge, and endpoint deployments [204]. Timing describes analyzing streaming databases in a real-time or near-realtime manner by involving advanced computing technologies such as DL accelerators [205], [206].…”

Section: F Communication Infrastructures Protocols and Investmentsmentioning

confidence: 99%

Deep Learning in Smart Grid Technology: A Review of Recent Advancements and Future Prospects

et al. 2021

View full text Add to dashboard Cite

The current electric power system witnesses a significant transition into Smart Grids (SG) as a promising landscape for high grid reliability and efficient energy management. This ongoing transition undergoes rapid changes, requiring a plethora of advanced methodologies to process the big data generated by various units. In this context, SG stands tied very closely to Deep Learning (DL) as an emerging technology for creating a more decentralized and intelligent energy paradigm while integrating high intelligence in supervisory and operational decision-making. Motivated by the outstanding success of DL-based prediction methods, this article attempts to provide a thorough review from a broad perspective on the state-of-the-art advances of DL in SG systems. Firstly, a bibliometric analysis has been conducted to categorize this review's methodology. Further, we taxonomically delve into the mechanism behind some of the trending DL algorithms. We then showcase the DL enabling technologies in SG, such as federated learning, edge intelligence, and distributed computing. Finally, challenges and research frontiers are provided to serve as guidelines for future work in the futuristic power grid domain. This study's core objective is to foster the synergy between these two fields for decision-makers and researchers to accelerate DL's practical deployment for SG systems. INDEX TERMSSmart grid, deep learning, deep neural networks, edge computing, distributed and federated learning, power systems. NOMENCLATURE Abbreviations DDL Distributed deep learning DL Deep learning DRL Deep reinforcement learning DRN Deep residual network EI Edge intelligence EPS Electric power systems FL Federated learning IoT Internet of things LSTM Long short-term memory neural network The associate editor coordinating the review of this manuscript and approving it for publication was Shadi Alawneh . NN Neural network PVPF Photovoltaic power forecasting D. RESEARCH METHODOLOGY AND SYSTEMATIC REVIEW PROTOCOLStarting from September 2019, the multiple-methods approach was conducted [24]. The collection of the mainstream research papers on SG/AI from Web of Science (WoS), Scopus, IEEE Xplore, Science Direct, and Google scholar was conducted as the largest databases of peerreviewed articles. Only peer-reviewed articles written in English, providing experimental results, and having a unique identifier from the mentioned databases were taken into consideration, including reviews, research articles, patent reports, and conference proceedings. The adopted methodology for conducting this review article employs a combination of keywords categorized into three main groups, specifically, 'Deep Learning', 'Smart Grid', and 'Prediction'. The search methodology focuses on the recent research articles from 2015-2020 to identify the comprehensive statues of the AI applications on SG. The filtering process results in 220 research papers from 600 related papers selected based on their relevance by reading the title, abstract, conclusion,

show abstract

“…Upcoming SoC-FPGAs platforms (e.g., Xilinx Versal) combine these heterogeneous resources, but challenges remain with respect to hardware support for safety-critical systems such as predictable interconnects, avoidance of temporal interference in memory and safety monitors. For example, while the portability to different GPU architectures and programming interfaces was addressed in prior work [195], portability to other resource types and the simultaneous usage of heterogeneous computing resources is also considered a challenge, with few works currently addressing this challenge [196].…”

Section: Heterogeneous Computing Platformsmentioning

confidence: 99%

Multi-core Devices for Safety-critical Systems

et al. 2020

View full text Add to dashboard Cite

Multi-core devices are envisioned to support the development of next-generation safety-critical systems, enabling the on-chip integration of functions of different criticality. This integration provides multiple system-level potential benefits such as cost, size, power, and weight reduction. However, safety certification becomes a challenge and several fundamental safety technical requirements must be addressed, such as temporal and spatial independence, reliability, and diagnostic coverage. This survey provides a categorization and overview at different device abstraction levels (nanoscale, component, and device) of selected key research contributions that support the compliance with these fundamental safety requirements.

show abstract

Enhancing the Programmability and Performance Portability of GPU Tensor Operations

Cited by 7 publications

References 10 publications

A Taxonomy of Modern GPGPU Programming Methods: On the Benefits of a Unified Specification

A Taxonomy of Modern GPGPU Programming Methods: On the Benefits of a Unified Specification

Deep Learning in Smart Grid Technology: A Review of Recent Advancements and Future Prospects

Multi-core Devices for Safety-critical Systems

Contact Info

Product

Resources

About