Proceedings of Programming Models and Applications on Multicores and Manycores 2014
DOI: 10.1145/2578948.2560684
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Partitioning-based JPEG Decompression on Heterogeneous Multicore Architectures

Abstract: With the emergence of social networks and improvements in computational photography, billions of JPEG images are shared and viewed on a daily basis. Desktops, tablets and smartphones constitute the vast majority of hardware platforms used for displaying JPEG images. Despite the fact that these platforms are heterogeneous multicores, no approach exists yet that is capable of joining forces of a system's CPU and GPU for JPEG decoding.In this paper we introduce a novel JPEG decoding scheme for heterogeneous archi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
12
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(12 citation statements)
references
References 12 publications
0
12
0
Order By: Relevance
“…Therefore, it is important to find an effective method to make full use of all the available computational resources of both the CPU and GPU. Recently, some approaches [3,4,5,6,7] have been developed to perform a specific task using both multi-core CPU and GPU simultaneously, instead of the CPU or GPU alone. In this paper, we present a way to distribute the workload into both the CPU and GPU, with a performance prediction model (i.e., a static strategy) including characteristics of feature extraction from the video stream data.…”
mentioning
confidence: 99%
“…Therefore, it is important to find an effective method to make full use of all the available computational resources of both the CPU and GPU. Recently, some approaches [3,4,5,6,7] have been developed to perform a specific task using both multi-core CPU and GPU simultaneously, instead of the CPU or GPU alone. In this paper, we present a way to distribute the workload into both the CPU and GPU, with a performance prediction model (i.e., a static strategy) including characteristics of feature extraction from the video stream data.…”
mentioning
confidence: 99%
“…Workload partitioning aims to partition a given workload (measured by flops for computation and by bytes for memory access) via dividing code or dataset among the processors of a given multiprocessor platform (e.g., Tang et al [2013], Ma et al [2012], and Sodsong et al [2014]). For homogeneous multiprocessor platforms, the general criterion [Huang and Feng 2009] of workload partitioning is to evenly partition the input dataset among the processors since dataset partitioning (DP) can achieve load balancing easily and each processor behaves identically for a particular algorithm.…”
Section: Workload Partitioningmentioning
confidence: 99%
“…A heterogeneous platform has different processors that deliver distinct performance/energy characteristics, and this processor heterogeneity increases the design space for partitioning the workload between different processors [Angiolini et al 2006;Chung et al 2010;Koufaty et al 2010]. For example, some studies have shown that partitioning the code [Daga and Nutter 2012;Sodsong et al 2014] instead of the input dataset [Zhong et al 2012;Grasso et al 2013] can better exploit the performance/energy potential of heterogeneous platforms. Such consideration is not necessary for uniprocessor or homogeneous multiprocessor platforms.…”
Section: Introductionmentioning
confidence: 99%
“…In previous work, we have devised a JPEG decoder capable of accelerating inverse discrete cosine transformations (IDCT), image upsampling and YCbCr to RGB color space conversions on systems consisting of a CPU and a GPU. Entropy decoding is the only remaining sequential task with this decoding scheme and thus stands in the way to further speedups.…”
Section: Introductionmentioning
confidence: 99%
“…Data transfers between CPU and GPU via the PCI bus can generate significant overhead . In our previous work, entropy decoding was conducted on the CPU, which necessitated the transfer of the inflated DCT coefficients to the GPU, where they eventually became the IDCT input. To reduce transfer costs, a run‐length encoded format was employed.…”
Section: Introductionmentioning
confidence: 99%