Motion Compensation and Reconstruction of H.264/AVC Video Bitstreams using the GPU

Pieters, Bart E.; Rijsselbergen, Dieter Van; Neve, Wesley De; Walle, Rik Van de

doi:10.1109/wiamis.2007.58

Cited by 6 publications

(5 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To better understand the issues of the H.264/MPEG-4 AVC case study one approach would be to look on examples where others have made hand-coded OpenCL or CUDA implementations. With this algorithm even hand-coded implementations struggle with these kinds of programs and sophisticated approaches are employed to get performance out of the algorithm [26], [17]. A lot of work is needed, both at the compiler side and on the RVC-CAL level to reach the same level of performance as CPU-only implementations.…”

Section: Discussionmentioning

confidence: 99%

“…The main difference between tightly coupled GPUs and discrete GPUs from an OpenCL point of view is that it is possible to map buffers between the host and the device, compared to discrete devices where it is necessary to perform a copy operation. Different devices have different characteristics and there is a lot of work done on optimizing hand written code for GPUs in particular [16], [26], [17].…”

Section: B Openclmentioning

confidence: 99%

See 1 more Smart Citation

Execution of Dataflow Process Networks on OpenCL Platforms

Lund

Kanur

Ersfolk

et al. 2015

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing

View full text Add to dashboard Cite

The trend in computing systems is to combine various kinds of processing elements (PEs) to build more parallel architectures. This trend leads to more heterogeneous computing systems, for which abstractions are needed to efficiently program the systems without increasing the programming cost. This has lead to new programming languages and application programming interfaces (APIs). Parallel programming has always been a holy grail in computer science and dataflow programming promises a way to automatically provide parallel constructs for the programmer. This paper provides an approach to translate dataflow process networks (DPNs) into programs running some of the computations on the Open Computing Language (OpenCL) platform, supporting running programs on massively parallel hardware such as graphics processing units (GPUs). We show that certain DPN programs could run very efficiently on dataparallel architectures but also that there are certain patterns in DPN programs that prove problematic.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: B Openclmentioning

confidence: 99%

Execution of Dataflow Process Networks on OpenCL Platforms

Lund

Kanur

Ersfolk

et al. 2015

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing

View full text Add to dashboard Cite

show abstract

“…8. In this example, a lower bound unit inspects vertex 2 and its three edges (2,11), (2,19), and (2,49).…”

Section: Extracting Fine-grain Parallelismmentioning

confidence: 99%

“…Possibly the best-known use of coprocessors are graphics processor units (GPUs), which accelerate 3-D rendering [1] and high-definition video playback [2]. While GPUs now form a substantial market in consumer computing, we believe that coprocessor acceleration also has the potential to achieve a significant impact for scientific computing.…”

Section: Introductionmentioning

confidence: 99%

A Special-Purpose Architecture for Solving the Breakpoint Median Problem

Bakos

Elenis

2008

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

Abstract-In this paper, we describe the design for a co-processor for whole-genome phylogenetic reconstruction. Our current design performs a parallelized breakpoint median computation, which is an expensive component of the overall application. When implemented on a field-programmable gate array (FPGA), our hardware breakpoint median achieves a maximum speedup of 1005 over software. When the coprocessor is used to accelerate the entire reconstruction procedure, we achieve a maximum application speedup of 417 . The results in this paper suggest that FPGA-based acceleration is a promising approach for computationally expensive phylogenetic problems, in spite of the fact that the involved algorithms are based on complex, control-dependent combinatorial optimization.

show abstract

“…Indeed, a preliminary GPU-based implementation of the deblocking filter for instance reduced the rendering speed to 2 frames per second for all renderers. For more information regarding limitations of H.264/AVC decoding on the GPU, we refer to Pieters et al 13 …”

Section: Comparisonmentioning

confidence: 99%

Performance evaluation of H.264/AVC decoding and visualization using the GPU

Pieters

Rijsselbergen

Neve

et al. 2007

Applications of Digital Image Processing XXX

Self Cite

View full text Add to dashboard Cite

The coding efficiency of the H.264/AVC standard makes the decoding process computationally demanding. This has limited the availability of cost-effective, high-performance solutions. Modern computers are typically equipped with powerful yet cost-effective Graphics Processing Units (GPUs) to accelerate graphics operations. These GPUs can be addressed by means of a 3-D graphics API such as Microsoft Direct3D or OpenGL, using programmable shaders as generic processing units for vector data. The new CUDA (Compute Unified Device Architecture) platform of NVIDIA provides a straightforward way to address the GPU directly, without the need for a 3-D graphics API in the middle. In CUDA, a compiler generates executable code from C code with specific modifiers that determine the execution model. This paper first presents an own-developed H.264/AVC renderer, which is capable of executing motion compensation (MC), reconstruction, and Color Space Conversion (CSC) entirely on the GPU. To steer the GPU, Direct3D combined with programmable pixel and vertex shaders is used. Next, we also present a GPU-enabled decoder utilizing the new CUDA architecture from NVIDIA. This decoder performs MC, reconstruction, and CSC on the GPU as well. Our results compare both GPU-enabled decoders, as well as a CPU-only decoder in terms of speed, complexity, and CPU requirements. Our measurements show that a significant speedup is possible, relative to a CPU-only solution. As an example, real-time playback of high-definition video (1080p) was achieved with our Direct3D and CUDA-based H.264/AVC renderers.

show abstract

Motion Compensation and Reconstruction of H.264/AVC Video Bitstreams using the GPU

Cited by 6 publications

References 4 publications

Execution of Dataflow Process Networks on OpenCL Platforms

Execution of Dataflow Process Networks on OpenCL Platforms

A Special-Purpose Architecture for Solving the Breakpoint Median Problem

Performance evaluation of H.264/AVC decoding and visualization using the GPU

Contact Info

Product

Resources

About