Execution of Dataflow Process Networks on OpenCL Platforms

Lund, Wictor; Kanur, Sudeep; Ersfolk, Johan; Tsiopoulos, Leonidas; Lilius, Johan; Haldin, Joakim; Falk, Ulf

doi:10.1109/pdp.2015.29

Cited by 18 publications

(7 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…16) of general purpose cores: on the i7 and the GTX 750Ti the performance goes beyond 5000 fps. Unfortunately, it was not possible to compare the GPU performance against any reference, since the only other design flow for GPU acceleration of RVC-CAL programs [21] has not yet been publicly released.…”

Section: Analysis Of Resultsmentioning

confidence: 99%

“…Besides our previous article [8], GPU programming based on the RVC-CAL language has been proposed in the work of Lund et al [21]. Although the objective is the same, Lund et al have approached the problem by taking RVC-CAL application descriptions that have previously been designed for execution on a general purpose processor, and translating those for execution on OpenCL devices.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Design Flow for GPU and Multicore Execution of Dynamic Dataflow Programs

Boutellier

Nylanden

2017

J Sign Process Syst

View full text Add to dashboard Cite

Dataflow programming has received increasing attention in the age of multicore and heterogeneous computing. Modular and concurrent dataflow program descriptions enable highly automated approaches for design space exploration, optimization and deployment of applications. A great advance in dataflow programming has been the recent introduction of the RVC-CAL language. Having been standardized by the ISO, the RVC-CAL dataflow language provides a solid basis for the development of tools, design methodologies and design flows. This paper proposes a novel design flow for mapping RVC-CAL dataflow programs to parallel and heterogeneous execution platforms. Through the proposed design flow the programmer can describe an application in the RVC-CAL language and map it to multi-and many-core platforms, as well as GPUs, for efficient execution. The functionality and efficiency of the proposed approach is demonstrated by a parallel implementation of a video processing application and a run-time reconfigurable filter for telecommunications. Experiments are performed on GPU and multicore platforms with up to 16 cores, and the results show that for high-performance applications the proposed design flow provides up to 4× higher throughput than the state-ofthe-art approach in multicore execution of RVC-CAL programs.

show abstract

Section: Analysis Of Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Design Flow for GPU and Multicore Execution of Dynamic Dataflow Programs

Boutellier

Nylanden

2017

J Sign Process Syst

View full text Add to dashboard Cite

show abstract

“…Compared to this work, the significant difference is that the StreamIt language heeds the SDF MoC, which does not allow run-time changes in token rates. The same token rate restriction applies to two recent works [24], [25] that discuss deployment of RVC-CAL dataflow programs to heterogeneous architectures.…”

Section: B Related Programming Frameworkmentioning

confidence: 99%

PRUNE: Dynamic and Decidable Dataflow for Signal Processing on Heterogeneous Platforms

Boutellier

Huttunen

et al. 2018

IEEE Trans. Signal Process.

View full text Add to dashboard Cite

Abstract-The majority of contemporary mobile devices and personal computers are based on heterogeneous computing platforms that consist of a number of CPU cores and one or more Graphics Processing Units (GPUs). Despite the high volume of these devices, there are few existing programming frameworks that target full and simultaneous utilization of all CPU and GPU devices of the platform.This article presents a dataflow-flavored Model of Computation (MoC) that has been developed for deploying signal processing applications to heterogeneous platforms. The presented MoC is dynamic and allows describing applications with data dependent run-time behavior. On top of the MoC, formal design rules are presented that enable application descriptions to be simultaneously dynamic and decidable. Decidability guarantees compile-time application analyzability for deadlock freedom and bounded memory.The presented MoC and the design rules are realized in a novel Open Source programming environment "PRUNE" and demonstrated with representative application examples from the domains of image processing, computer vision and wireless communications. Experimental results show that the proposed approach outperforms the state-of-the-art in analyzability, flexibility and performance.

show abstract

“…Various studies have targeted automated exploitation of parallelism to map dataflow models onto heterogeneous computing platforms. Design tools that exploit various forms of parallelism using CUDA or OpenCL have been developed in [5,17,21]. These tools assume that vectorization has been specified by the designer, and map an actor onto a GPU whenever a GPU-accelerated implementation of the actor is available.…”

Section: Related Workmentioning

confidence: 99%

Memory-Constrained Vectorization and Scheduling of Dataflow Graphs for Hybrid CPU-GPU Platforms

Lin

Bhattacharyya

2018

ACM Trans. Embed. Comput. Syst.

View full text Add to dashboard Cite

The increasing use of heterogeneous embedded systems with multi-core CPUs and Graphics Processing Units (GPUs) presents important challenges in effectively exploiting pipeline, task and data-level parallelism to meet throughput requirements of digital signal processing (DSP) applications. Moreover, in the presence of system-level memory constraints, hand optimization of code to satisfy these requirements is inefficient and error-prone, and can therefore, greatly slow down development time or result in highly underutilized processing resources. In this paper, we present vectorization and scheduling methods to effectively exploit multiple forms of parallelism for throughput optimization on hybrid CPU-GPU platforms, while conforming to system-level memory constraints. The methods operate on synchronous dataflow representations, which are widely used in the design of embedded systems for signal and information processing. We show that our novel methods can significantly improve system throughput compared to previous vectorization and scheduling approaches under the same memory constraints. In addition, we present a practical case-study of applying our methods to significantly improve the throughput of an orthogonal frequency division multiplexing (OFDM) receiver system for wireless communications.

show abstract

Execution of Dataflow Process Networks on OpenCL Platforms

Cited by 18 publications

References 15 publications

Design Flow for GPU and Multicore Execution of Dynamic Dataflow Programs

Design Flow for GPU and Multicore Execution of Dynamic Dataflow Programs

PRUNE: Dynamic and Decidable Dataflow for Signal Processing on Heterogeneous Platforms

Memory-Constrained Vectorization and Scheduling of Dataflow Graphs for Hybrid CPU-GPU Platforms

Contact Info

Product

Resources

About