Dynamic Partitioning-based JPEG Decompression on Heterogeneous Multicore Architectures

Sodsong, Wasuwee; Hong, Jingun; Chung, Seongwook; Lim, Yeongkyu; Kim, Shin-Dug; Burgstaller, Bernd

doi:10.1145/2578948.2560684

Cited by 7 publications

(12 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, it is important to find an effective method to make full use of all the available computational resources of both the CPU and GPU. Recently, some approaches [3,4,5,6,7] have been developed to perform a specific task using both multi-core CPU and GPU simultaneously, instead of the CPU or GPU alone. In this paper, we present a way to distribute the workload into both the CPU and GPU, with a performance prediction model (i.e., a static strategy) including characteristics of feature extraction from the video stream data.…”

mentioning

confidence: 99%

CPU-GPU hybrid computing for feature extraction from video stream

Lee

Kim

Park

et al. 2014

IEICE Electron. Express

View full text Add to dashboard Cite

In this paper, we propose a way to distribute the video analytics workload into both the CPU and GPU, with a performance prediction model including characteristics of feature extraction from the video stream data. That is, we estimate the total execution time of a CPU-GPU hybrid computing system with the performance prediction model, and determine the optimal workload ratio and how to use the CPU cores for the given workload. Based on experimental results, we confirm that our proposed method can improve the speedups of three typical workload distributions: CPU-only, GPU-only, or CPU-GPU hybrid computing with a 50:50 workload ratio.

show abstract

mentioning

confidence: 99%

CPU-GPU hybrid computing for feature extraction from video stream

Lee

Kim

Park

et al. 2014

IEICE Electron. Express

View full text Add to dashboard Cite

show abstract

“…Workload partitioning aims to partition a given workload (measured by flops for computation and by bytes for memory access) via dividing code or dataset among the processors of a given multiprocessor platform (e.g., Tang et al [2013], Ma et al [2012], and Sodsong et al [2014]). For homogeneous multiprocessor platforms, the general criterion [Huang and Feng 2009] of workload partitioning is to evenly partition the input dataset among the processors since dataset partitioning (DP) can achieve load balancing easily and each processor behaves identically for a particular algorithm.…”

Section: Workload Partitioningmentioning

confidence: 99%

“…A heterogeneous platform has different processors that deliver distinct performance/energy characteristics, and this processor heterogeneity increases the design space for partitioning the workload between different processors [Angiolini et al 2006;Chung et al 2010;Koufaty et al 2010]. For example, some studies have shown that partitioning the code [Daga and Nutter 2012;Sodsong et al 2014] instead of the input dataset [Zhong et al 2012;Grasso et al 2013] can better exploit the performance/energy potential of heterogeneous platforms. Such consideration is not necessary for uniprocessor or homogeneous multiprocessor platforms.…”

Section: Introductionmentioning

confidence: 99%

PeaPaw

Tang

Barrett

Cook

et al. 2017

ACM Trans. Des. Autom. Electron. Syst.

View full text Add to dashboard Cite

Performance and energy are two major concerns for application development on heterogeneous platforms. It is challenging for application developers to fully exploit the performance/energy potential of heterogeneous platforms. One reason is the lack of reliable prediction of the system's performance/energy before application implementation. Another reason is that a heterogeneous platform presents a large design space for workload partitioning between different processors. To reduce such development cost, this article proposes a framework, PeaPaw, to assist application developers to identify a workload partition (WP) that has high potential leading to high performance or energy efficiency before actual implementation. The PeaPaw framework includes both analytical performance/energy models and two sets of workload partitioning guidelines. Based on the design goal, application developers can obtain a workload partitioning guideline from PeaPaw for a given platform and follow it to design one or multiple WPs for a given workload. Then PeaPaw can be used to estimate the performance/energy of the designed WPs, and the WP with the best estimated performance/energy can be selected for actual implementation. To demonstrate the effectiveness of PeaPaw, we have conducted three case studies. Results from these case studies show that PeaPaw can faithfully estimate the performance/energy relationships of WPs and provide effective workload partitioning guidelines.

show abstract

“…In previous work, we have devised a JPEG decoder capable of accelerating inverse discrete cosine transformations (IDCT), image upsampling and YCbCr to RGB color space conversions on systems consisting of a CPU and a GPU. Entropy decoding is the only remaining sequential task with this decoding scheme and thus stands in the way to further speedups.…”

Section: Introductionmentioning

confidence: 99%

“…Data transfers between CPU and GPU via the PCI bus can generate significant overhead . In our previous work, entropy decoding was conducted on the CPU, which necessitated the transfer of the inflated DCT coefficients to the GPU, where they eventually became the IDCT input. To reduce transfer costs, a run‐length encoded format was employed.…”

Section: Introductionmentioning

confidence: 99%

JParEnt: Parallel entropy decoding for JPEG decompression on heterogeneous multicore architectures

Sodsong

Jung

Park

et al. 2017

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

Summary The JPEG format employs Huffman codes to compress the entropy data of an image. Huffman codewords are of variable length, which makes parallel entropy decoding a difficult problem. To determine the start position of a codeword in the bitstream, the previous codeword must be decoded first. We present JParEnt, a new approach to parallel entropy decoding for JPEG decompression on heterogeneous multicores. JParEnt conducts JPEG decompression in two steps: (1) an efficient sequential scan of the entropy data on the CPU to determine the start‐positions (boundaries) of coefficient blocks in the bitstream, followed by (2) a parallel entropy decoding step on the graphics processing unit (GPU). The block boundary scan constitutes a reinterpretation of the Huffman‐coded entropy data to determine codeword boundaries in the bitstream. We introduce a dynamic workload partitioning scheme to account for GPUs of low compute power relative to the CPU. This configuration has become common with the advent of SoCs with integrated graphics processors (IGPs). We leverage additional parallelism through pipelined execution across CPU and GPU. For systems providing a unified address space between CPU and GPU, we employ zero‐copy to completely eliminate the data transfer overhead. Our experimental evaluation of JParEnt was conducted on six heterogeneous multicore systems: one server and two desktops with dedicated GPUs, one desktop with an IGP, and two embedded systems. For a selection of more than 1000 JPEG images, JParEnt outperforms the SIMD–implementation of the libjpeg‐turbo library by up to a factor of 4.3×, and the previously fastest JPEG decompression method for heterogeneous multicores by up to a factor of 2.2×. JParEnt's entropy data scan consumes 45% of the entropy decoding time of libjpeg‐turbo on average. Given this new ratio for the sequential part of JPEG decompression, JParEnt achieves up to 97% of the maximum attainable speedup (95% on average). On the IGP‐based desktop platform, JParEnt achieves energy savings of up to 45% compared to libjpeg‐turbo's SIMD‐implementation.

show abstract

Dynamic Partitioning-based JPEG Decompression on Heterogeneous Multicore Architectures

Cited by 7 publications

References 12 publications

CPU-GPU hybrid computing for feature extraction from video stream

CPU-GPU hybrid computing for feature extraction from video stream

PeaPaw

JParEnt: Parallel entropy decoding for JPEG decompression on heterogeneous multicore architectures

Contact Info

Product

Resources

About