2010 Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia) 2010
DOI: 10.1109/primeasia.2010.5604952
|View full text |Cite
|
Sign up to set email alerts
|

Speeding up motion estimation algorithms on CUDA technology

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2012
2012
2018
2018

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 5 publications
0
6
0
Order By: Relevance
“…Therefore, most of the parallel ME research work is on many-core concentrates on the full-search method, which is inherently highly parallel. In Chen and Hang [15]; Cheng et al [16]; Lee and Oh [17]; Monteiro et al [18], the parallel fullsearch method is implemented on the GPU platform, and about 10-100x speed-ups are, respectively, obtained compared with the serial full-search method on single core of a CPU. Although the speed-up of the full-search method is high on GPU platform, its performance advantage is not obvious compared with serial fast search method in HEVC or H.264/AVC on single core of a CPU.…”
Section: Related Workmentioning
confidence: 99%
“…Therefore, most of the parallel ME research work is on many-core concentrates on the full-search method, which is inherently highly parallel. In Chen and Hang [15]; Cheng et al [16]; Lee and Oh [17]; Monteiro et al [18], the parallel fullsearch method is implemented on the GPU platform, and about 10-100x speed-ups are, respectively, obtained compared with the serial full-search method on single core of a CPU. Although the speed-up of the full-search method is high on GPU platform, its performance advantage is not obvious compared with serial fast search method in HEVC or H.264/AVC on single core of a CPU.…”
Section: Related Workmentioning
confidence: 99%
“…Yang et al optimized the exhaustive search with shared memory [31]. Cheng et al evaluated exhaustive search, diamond search, and four-step search in the GPU [32]. Attempts to use the GPU for deblocking and intra coding were also reported in [33] and [34].…”
Section: Encoding Using Gpumentioning
confidence: 99%
“…MB level parallelism is exploited where the MBs in the frame are evenly partitioned among the available processing cores. Note that the parallel ES algorithms proposed in [37][38][39][40][41], and implemented using OpenCL [37,38] and CUDA [39][40][41], are also based on MB level parallelism along with search point parallel processing to compute the cost of each search point. Simulation results are given interms of the average number of fitness function evaluations per lab for a given frame based on the first 100 frames of every sequence.…”
Section: Comparison With Existing Parallel Me Algorithmsmentioning
confidence: 99%
“…As a result, the proposed algorithm provides tremendous speedup if implemented on modern high performance computing (HPC) platforms ranging from multicore/many-core machine architectures to graphics processing units to supercomputers. In the literature, there have been several attempts to parallelize motion estimation [37][38][39][40][41][42]. Several works have proposed applying GPUs for motion estimation.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation