2014
DOI: 10.1186/1687-5281-2014-16
|View full text |Cite
|
Sign up to set email alerts
|

Implementation of fast HEVC encoder based on SIMD and data-level parallelism

Abstract: This paper presents several optimization algorithms for a High Efficiency Video Coding (HEVC) encoder based on single instruction multiple data (SIMD) operations and data-level parallelism. Based on the analysis of the computational complexity of HEVC encoder, we found that interpolation filter, cost function, and transform take around 68% of the total computation, on average. In this paper, several software optimization techniques, including frame-level interpolation filter and SIMD implementation for those c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
40
0
1

Year Published

2015
2015
2022
2022

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 50 publications
(41 citation statements)
references
References 14 publications
0
40
0
1
Order By: Relevance
“…Parallelizing DCT with SIMD instructions was one of the targets of [4]. In [5] a combined CPU-GPUs approach for parallel motion estimation is presented, while [6] includes a comparative study between motion estimation parallelism using CUDA cores, MPI and OpenMP.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Parallelizing DCT with SIMD instructions was one of the targets of [4]. In [5] a combined CPU-GPUs approach for parallel motion estimation is presented, while [6] includes a comparative study between motion estimation parallelism using CUDA cores, MPI and OpenMP.…”
Section: Related Workmentioning
confidence: 99%
“…In [9] the problem of balancing slices was tackled by introducing more slices than the number of available cores. In HEVC, the authors in [10] evaluated slice parallelism using fixed slices under various encoding scenarios, while in [4] the CTU cost estimation used for adapting slice size, was based on weighting upon the depth of each CU comprising the CTU. Last, in [11] the GOP structure for LD setting was used to estimate CTU cost.…”
Section: Related Workmentioning
confidence: 99%
“…The technique was implemented on hardware-based approach over VLSI architecture. Ahn et al [10] have discussed various optimization techniques using H.265 with the aid of parallel processing. Also, the authors have presented a task scheduling technique along with slicing process in parallelization over multicourse using H.265.…”
Section: Literature Surveymentioning
confidence: 99%
“…But in this paper, we mainly optimize HEVC encoding. Ahn et al [11] parallelized HEVC encoder interpolation filter, cost function, and transform to reduce the complexity of HEVC, which is caused by single instruction, multiple data (SIMD) operations, and data-level parallelism. Min et al [12] proposed a distributed video coding method with a hierarchical group of picture structure.…”
Section: Related Workmentioning
confidence: 99%