2023
DOI: 10.3390/electronics12102223
|View full text |Cite
|
Sign up to set email alerts
|

On Combining Wavefront and Tile Parallelism with a Novel GPU-Friendly Fast Search

Abstract: As the necessity of supporting ever-increasing demands in video resolution leads to new video coding standards, the challenge of harnessing their computational overhead becomes important. Such overhead stems not only from the increased image data due to higher resolutions but also from the coding techniques per se that are introduced by each standard to improve compression. All modern standards in the field of video coding offer high compression efficiency, but this is achieved by increasing the computational … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(9 citation statements)
references
References 28 publications
0
9
0
Order By: Relevance
“…This modified encoder uses tiles as a primary parallelism scheme but internally uses WPP for each tile separately with multiple CPU cores working in a parallel pattern. Extending the original WTP encoder, the GPU's fast search algorithm has been demonstrated (4PhaseFS) [10] to speed up the motion estimation calculations, which has been proven to be one of the most intensive tasks in a video encoder. As in our previous works, to achieve the best performance from our experiments, the application thread pool was fed with a pre-selected number of threads that matched exactly with the physical CPU cores we had available for our setup.…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…This modified encoder uses tiles as a primary parallelism scheme but internally uses WPP for each tile separately with multiple CPU cores working in a parallel pattern. Extending the original WTP encoder, the GPU's fast search algorithm has been demonstrated (4PhaseFS) [10] to speed up the motion estimation calculations, which has been proven to be one of the most intensive tasks in a video encoder. As in our previous works, to achieve the best performance from our experiments, the application thread pool was fed with a pre-selected number of threads that matched exactly with the physical CPU cores we had available for our setup.…”
Section: Methodsmentioning
confidence: 99%
“…In contrast, optimized software algorithms have been introduced to reduce the computational load [6][7][8] in variants of existing software video encoders without any additional costs. The proposed work in this paper is also a software algorithm that extends our proven hybrid encoder [9,10] to increase the speedup times by minimizing the calculations of the fraction motion estimation (FME) part by skipping them when specific criteria are met inside a fraction execution resolver (FER) function. Our encoder already has a GPU-friendly fast integer motion estimation algorithm [10], so this extension focuses only on the fraction optimization part.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations