2018
DOI: 10.1049/iet-cdt.2017.0149
|View full text |Cite
|
Sign up to set email alerts
|

CUDA memory optimisation strategies for motion estimation

Abstract: As video processing technologies continue to rise quicker than central processing unit (CPU) performance in complexity and image resolution, data-parallel computing methods will be even more important. In fact, the high-performance, data-parallel architecture of modern graphics processing unit (GPUs) can minimise execution times by orders of magnitude or more. However, creating an optimal GPU implementation not only needs converting sequential implementation of algorithms into parallel ones but, more important… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
5
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 14 publications
0
5
0
Order By: Relevance
“…Finally, the kernel (GPU) returns to the caller (CPU) a set of indexed arrays-concerning the whole search area [40]-with the minimum distortion ratios as well as the corresponding motion vector. To minimize the data transfer costs, memory optimization strategies are used between the kernel and the host [41,42].…”
Section: Methodsmentioning
confidence: 99%
“…Finally, the kernel (GPU) returns to the caller (CPU) a set of indexed arrays-concerning the whole search area [40]-with the minimum distortion ratios as well as the corresponding motion vector. To minimize the data transfer costs, memory optimization strategies are used between the kernel and the host [41,42].…”
Section: Methodsmentioning
confidence: 99%
“…The minimum distortion ratios from the previous processing step, which concerns the whole search area, are the motion vectors selected for the next step in the Host program (CPU). Memory optimization strategies are used to confine the data transfers between the host and the kernel [36]. The minimum estimated RD-ratio is defined from Equation (1), which is also computed inside the kernel (GPU program created with a modified C programming language).…”
Section: Methodsmentioning
confidence: 99%
“…For that reason, to minimize the transfer data costs, the kernel (GPU program) is implemented as a single program that executes both the A and B functions described above, at once, without having to return partial results to the host. When the 4-phase search cycle is completed, then and only then, the host receives back the final minimum RD cost array with respective MVs [36]. From this point so far, the CPU thread continues with the next step.…”
Section: Methodsmentioning
confidence: 99%
“…For better combining the previous parallelization techniques on CPU, [9] proposed two joint algorithms which are based on WPP and on a traditional GOP-based division pattern. On the other hand, the effective parallel implementation of crucial parts in ME is very important [10], [11]. Sayadi et al [10] propose the memory optimization strategies for better making full use of GPU resources and accelerating ME.…”
Section: Introductionmentioning
confidence: 99%
“…On the other hand, the effective parallel implementation of crucial parts in ME is very important [10], [11]. Sayadi et al [10] propose the memory optimization strategies for better making full use of GPU resources and accelerating ME. Because the calculation of SAD or SSD is a time-consuming part in ME, [11] proposes a fast parallel implementation of SAD or SSD using parallel reduction technique.…”
Section: Introductionmentioning
confidence: 99%