GPU-assisted HEVC intra decoder

Souza, Diego F. de; Ilić, Aleksandar; Roma, Nuno; Sousa, Leonel

doi:10.1007/s11554-015-0519-1

Cited by 12 publications

(9 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The more detailed parallelization strategies for the IT, MC, IP, and the in-loop filters (i.e. DBF and SAO) have been elaborated in [14], [9], [12], and [45], respectively. : Parallel decoding on the GPU with two independent frames in flight (and hence two cuda streams), assuming that the considered GPU has enough resources to execute multiple kernels concurrently.…”

Section: Parallel Decoding On the Cpu And Gpu Devicesmentioning

confidence: 99%

Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU

Wang

Souza

Alvarez-Mesa³

et al. 2018

Signal Processing: Image Communication

Self Cite

View full text Add to dashboard Cite

The High Efficiency Video Coding (HEVC) standard provides a higher compression efficiency than other video coding standards but at the cost of an increased computational load, which makes hard to achieve real-time encoding/decoding for ultra high-resolution and high-quality video sequences. Graphics Processing Units (GPUs) are known to provide massive processing capability for highly parallel and regular computing kernels, but not all HEVC decoding procedures are suited for GPU execution. Furthermore, if HEVC decoding is accelerated by GPUs, energy efficiency is another concern for heterogeneous CPU+GPU decoding. In this paper, a highly parallel HEVC decoder for heterogeneous CPU+GPU system is proposed. It exploits available parallelism in HEVC decoding on the CPU, GPU, and between the CPU and GPU devices simultaneously. On top of that, different workload balancing schemes can be selected according to the devoted CPU and GPU computing resources. Furthermore, an energy optimized solution is proposed by tuning GPU clock rates. Results show that the proposed decoder achieves better performance than the state-of-the-art CPU decoder, and the best performance among the workload balancing schemes depends on the available CPU and GPU computing resources. In particular, with an NVIDIA Titan X Maxwell GPU and an Intel Xeon E5-2699v3 CPU, the proposed decoder delivers 167 frames per second (fps) for Ultra HD 4K videos, when four CPU cores are used. Compared to the state-of-the-art CPU decoder using four CPU cores, the proposed decoder gains a speedup factor of 2.2×. When decoding performance is bounded by the CPU, a system wise energy reduction up to 36% is achieved by using fixed (and lower) GPU clocks, compared to the default dynamic clock settings on the GPU.

show abstract

Section: Parallel Decoding On the Cpu And Gpu Devicesmentioning

confidence: 99%

Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU

Wang

Souza

Alvarez-Mesa³

et al. 2018

Signal Processing: Image Communication

Self Cite

View full text Add to dashboard Cite

show abstract

“…As in [9], the warps of the IT GPU kernel are assigned according to the block partitioning obtained from the bitstream. The new optimizations that were herein introduced for the IT GPU kernel consist of: i) better data packing: the required data is stored in a 2 bytes word per 8×8 block, which includes the block sizes, transform flags, prediction type; and ii) inter predicted blocks: the new IT GPU kernel already supports the inverse transform of inter predicted blocks, which have not been considered for the intra decoder in [9].…”

Section: B Optimization Of the Decoding Procedures For Gpu Executionmentioning

confidence: 99%

“…If the current frame is encoded as I frame, with only intra predicted blocks, then the IP kernel is started right after the IT kernel. As it was proposed in [9], each warp performs the intra prediction of all blocks or sub-blocks in a 8-sample row of the frame. Similarly, the thread block of this kernel consists of 8 warps, which perform a frame row with a height of 64 samples (FR×64), thus accomplishing a wavefront approach for the whole frame.…”

Section: Global Memory Host Memorymentioning

confidence: 99%

Efficient HEVC decoder for heterogeneous CPU with GPU systems

Wang

Alvarez-Mesa

Ching

et al. 2016

2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)

Self Cite

View full text Add to dashboard Cite

The High Efficiency Video Coding (HEVC) standard provides higher compression efficiency than other video coding standards but at the cost of increased computational load, which makes it hard to achieve real-time encoding/decoding of high-resolution, high-quality video sequences. In this paper, we investigate how Graphics Processing Units (GPUs) can be employed to accelerate HEVC decoding. GPUs are known to provide massive processing capability for throughput computing kernels, but the HEVC entropy decoding kernel cannot be executed efficiently on GPUs. We therefore propose a complete HEVC decoding solution for heterogeneous CPU+GPU systems, in which the entropy decoder is executed on the CPU and the remaining kernels on the GPU. Furthermore, the decoder is pipelined such that the CPU and the GPU can decode different frames in parallel. The proposed CPU+GPU decoder achieves an average frame rate of 150 frames per second for Ultra HD 4K video sequences when four CPU cores are used with an NVIDIA GeForce Titan X GPU.

show abstract

“…The majority of those articles are dealing with encoding challenges. Some of them like [5] exploits Graphics Processing Units (GPUs) to accelerate the intra decoding procedure in HEVC decoder. Hardware partial implementations of H.265 in HLS are presented e.g., in [6] and [7] dealing with only part of the standard, which may imply overall challenges in implementing the entire HEVC encoding/decoding in FPGA.…”

mentioning

confidence: 99%

H.265 Inverse Transform FPGA implementation in Impulse C

Cichoń¹,

Gorgon²

2017

Annals of Computer Science and Information Systems

View full text Add to dashboard Cite

Abstract-High Efficiency Video Coding (HEVC), a modern video compression standard, exceeds the predecessor H.264 in efficiency by 50%, but with cost of increased complexity. It is one of main research topics for FPGA engineers working on image compression algorithms. On the other hand high-level synthesis tools after few years of lower interest from the industry and academic research, started to gain more of it recently. This paper presents FPGA implementation of HEVC 2D Inverse DCT transform implemented on Xilinx Virtex-6 using Impulse C high level language. Achieved results exceed 1080p@30fps with relatively high FPGA clock frequency and moderate resource usage.

show abstract

GPU-assisted HEVC intra decoder

Cited by 12 publications

References 19 publications

Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU

Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU

Efficient HEVC decoder for heterogeneous CPU with GPU systems

H.265 Inverse Transform FPGA implementation in Impulse C

Contact Info

Product

Resources

About