OpenACC-based GPU acceleration of an optical flow algorithm

Martin, Nelson; Collado, Jorge; Botella, Guillermo; Garcı́a, Carlos; Prieto, Manuel

doi:10.1145/2695664.2695968

Cited by 2 publications

(2 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…). For the optical flow application, Martin et al have found that by using OpenACC, the GPU programming learning curve is less steep, while existing C code can easily be ported with modifying and adding only 8% of the code lines. To gain more insight in the execution of OpenACC directives and the corresponding API calls, Dietrich et al have presented a performance analysis framework for OpenACC.…”

Section: Related Workmentioning

confidence: 99%

Dataflow management, dynamic load balancing, and concurrent processing for real‐time embedded vision applications using Quasar

Goossens

2018

Circuit Theory & Apps

View full text Add to dashboard Cite

Programming modern embedded vision systems brings various challenges, due to the steep learning curve for programmers and the different characteristics of the devices. Quasar, a new high-level programming language and development environment, considerably simplifies the development. Quasar has a compiler that detects and optimizes parallel programming patterns and a heterogeneous runtime that distributes the computational load over the available compute devices (CPUs and Graphical Processing Unit [GPUs]). In this paper, we focus on runtime aspects of Quasar. We show that with good approximation, the execution time of a GPU kernel function can be factorized in a compile-time-specific component and a runtime-specific component. We show that this approximation leads to a computationally simple runtime load balancing rule. Moreover, the load balancing rule permits efficient implicit concurrency of kernel functions and automatic scaling to multiple compute devices (eg, multi-CPU/GPU systems). Based on an appropriate mathematical scheduling model, we investigate the command queue size trade-off between memory usage and device utilization. The result is a programming environment for embedded vision systems for which automatic parallelization and implicit concurrency detection allow scaling the program efficiently to multi-CPU/GPU systems. Finally, benchmark results are provided to demonstrate the performance of our approach compared with OpenACC and CUDA (Compute Unified Device Architecture).

show abstract

Section: Related Workmentioning

confidence: 99%

Dataflow management, dynamic load balancing, and concurrent processing for real‐time embedded vision applications using Quasar

Goossens

2018

Circuit Theory & Apps

View full text Add to dashboard Cite

show abstract

“…Moreover, another hybrid model mitigate the bottleneck of motion estimation algorithms with a small percentage of source code modification. In [16], Nelson and Jorge proposed the first implementation of optical flow of Lucas-kanade algorithm based on directives of OpenACC programming paradigms on GPU. In the same context of hybride model, OpenMP provides an excellent opportunity to target hardware accelerators (GPUs) with the new version(4.0,4.5) which is very similar to the OpenACC model.…”

Section: Introductionmentioning

confidence: 99%

Memory Efficient Deployment of an Optical Flow Algorithm on GPU Using OpenMP

Haggui

Tadonki

Sayadi

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

In this paper, we consider the recent set of OpenMP directives related to GPU deployment and seek an evaluation through the case of an optical flow algorithm. We start by investigating various agnostic transformations that attempt to improve memory efficiency. Our case study is the so-called Lucas-Kanade algorithm, which is typically composed of a series of convolution masks (approximation of the derivatives) followed by 2 × 2 linear systems for the optical flow vectors. Since, we are dealing with a stencil computation for each stage of the algorithm, the overhead of memory accesses together with the impact on parallel scalability are expected to be noticeable, especially with the complexity of the GPU memory system. We compare our OpenMP implementation with an OpenACC one from our previous work, both on a Quadro P5000.

show abstract

OpenACC-based GPU acceleration of an optical flow algorithm

Cited by 2 publications

References 5 publications

Dataflow management, dynamic load balancing, and concurrent processing for real‐time embedded vision applications using Quasar

Dataflow management, dynamic load balancing, and concurrent processing for real‐time embedded vision applications using Quasar

Memory Efficient Deployment of an Optical Flow Algorithm on GPU Using OpenMP

Contact Info

Product

Resources

About