Efficient orchestration of sub-word parallelism in media processors

Oliver, John; Akella, Venkatesh; Chong, Frederic T.

doi:10.1145/1007912.1007946

Cited by 3 publications

(3 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Motorola's AltiVec (6) includes a three operand instruction (vperm) for data rearrangement. Oliver et al (14) propose to include a subword permutation unit (SPU) in the execution pipeline. This SPU allows data permutation operations to be performed before other operations by removing permutation instructions from the instruction stream and instead having the SPU controller schedule the rearrangement instructions.…”

Section: Related Workmentioning

confidence: 99%

Avoiding Conversion and Rearrangement Overhead in SIMD Architectures

Shahbahrami

Juurlink

Borodin

et al. 2006

Int J Parallel Prog

View full text Add to dashboard Cite

Single-Instruction Multiple-Data (SIMD) instructions provide an inexpensive way to exploit the Data-Level Parallelism in multimedia applications. However, the performance improvement obtained by employing SIMD instructions is often limited because frequently many overhead instructions are required to bring data in a form amenable to SIMD processing. In this paper, we employ two techniques to overcome this limitation. The first technique, extended subwords, uses four extra bits for every byte in a media register. This allows many SIMD operations to be performed without overflow and avoids packing/unpacking conversion overhead. The second technique, Matrix Register File (MRF), allows flexible row-wise as well as column-wise access to the register file. It is useful for many two-dimensional multimedia algorithms such as the (I) Discrete Cosine Transform, 2 × 2 Haar Transform, and pixel padding. In addition, we propose a few new media instructions. Experimental results obtained by extending the SimpleScalar toolset show that these techniques improve performance by up to a factor of 4.5 compared to a conventional SIMD instruction set extension.

show abstract

Section: Related Workmentioning

confidence: 99%

Avoiding Conversion and Rearrangement Overhead in SIMD Architectures

Shahbahrami

Juurlink

Borodin

et al. 2006

Int J Parallel Prog

View full text Add to dashboard Cite

show abstract

“…Motorola's AltiVec [7] includes a three operand instruction (vperm) for data rearrangement. Oliver et al [16] propose to include a subword permutation unit (SPU) in the execution pipeline. This SPU allows data permutation operations to be performed before other operations by removing permutation instructions from the instruction stream and instead having the SPU controller schedule the rearrangement instructions.…”

Section: Related Workmentioning

confidence: 99%

“…Matrix Multiply 16 + (39 + 48 · N/2) · M/16 16 + (39 + 48 · N/2) · M/16 5 + (20 + 18 · N/4) · M/4 Input data less than or equal 16-bit Vector/Matrix Multiply 16 + (39 + 48 · N/2) · M/16 5 + (10 + 15 · N/2) · M/8 5 + (47 + 39 · N/8) · M/8 input data is less than or equal 12-bit Matrix/Matrix Multiply…”

mentioning

confidence: 99%

Matrix register file and extended subwords

Shahbahrami

Juurlink

Vassiliadis

2005

Proceedings of the 2nd Conference on Computing Frontiers

View full text Add to dashboard Cite

In this paper we employ two techniques suitable for embedded media processors. The first technique, extended subwords, uses four extra bits for every byte in a media register. This allows many SIMD operations to be performed without overflow and avoids packing/unpacking conversion overhead because of mismatch between storage and computational formats. The second technique, the Matrix Register File (MRF), allows flexible row-wise as well as column-wise access to the register file. It is useful for many block-based multimedia kernels such as (I)DCT, 2 × 2 Haar Transform, and pixel padding. In addition, we propose a few new media instructions. We employ Modified MMX (MMMX), MMX with extended subwords, to evaluate these techniques. Our results show that MMMX combined with an MRF reduces the dynamic number of instructions by up to 80% compared to other multimedia extensions such as MMX.

show abstract