2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) 2015
DOI: 10.1109/samos.2015.7363667
|View full text |Cite
|
Sign up to set email alerts
|

HEVC in-loop filters GPU parallelization in embedded systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
1
1

Relationship

4
2

Authors

Journals

citations
Cited by 17 publications
(22 citation statements)
references
References 9 publications
0
22
0
Order By: Relevance
“…Regarding software-based GPU acceleration for video decoding, most of previous work targets only single HEVC decoding modules, such as Inverse Transform (IT) in [14,19], Motion Compensation (MC) in [9], Intra Prediction (IP) in [11], Deblocking Filter (DBF) in [16,25], and in-loop filters in [10]. In particular, Souza et al [13] presented a set of optimized GPU kernels, where they optimized and integrated individual HEVC modules.…”
Section: Related Workmentioning
confidence: 99%
“…Regarding software-based GPU acceleration for video decoding, most of previous work targets only single HEVC decoding modules, such as Inverse Transform (IT) in [14,19], Motion Compensation (MC) in [9], Intra Prediction (IP) in [11], Deblocking Filter (DBF) in [16,25], and in-loop filters in [10]. In particular, Souza et al [13] presented a set of optimized GPU kernels, where they optimized and integrated individual HEVC modules.…”
Section: Related Workmentioning
confidence: 99%
“…For each GPU kernel, their thread block mapping is shown at the bottom. These decoding modules will be briefly introduced since their algorithm has been elaborated individually in [8]- [12]. For all target kernels, one common optimization is concerned with their ability to support video sequences with 10-bit depth, while previous approaches could only decode bitstreams with 8-bit depth.…”
Section: B Optimization Of the Decoding Procedures For Gpu Executionmentioning
confidence: 99%
“…For each sub-filter, two edges in the same direction can be processed at the same time. The thread mapping of the DBF has been optimized over [11] and [12], where an area of 256×8 samples is cooperatively processed by two warps within a thread block. When the horizontal filter starts, each warp maps to a set of 256×4 samples, where each thread maps to one horizontal edge of 8×4 samples.…”
Section: Global Memory Host Memorymentioning
confidence: 99%
See 2 more Smart Citations