Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures 2004
DOI: 10.1145/1007912.1007946
|View full text |Cite
|
Sign up to set email alerts
|

Efficient orchestration of sub-word parallelism in media processors

Abstract: ABSTRACT

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
3
0

Year Published

2005
2005
2017
2017

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 16 publications
0
3
0
Order By: Relevance
“…Motorola's AltiVec (6) includes a three operand instruction (vperm) for data rearrangement. Oliver et al (14) propose to include a subword permutation unit (SPU) in the execution pipeline. This SPU allows data permutation operations to be performed before other operations by removing permutation instructions from the instruction stream and instead having the SPU controller schedule the rearrangement instructions.…”
Section: Related Workmentioning
confidence: 99%
“…Motorola's AltiVec (6) includes a three operand instruction (vperm) for data rearrangement. Oliver et al (14) propose to include a subword permutation unit (SPU) in the execution pipeline. This SPU allows data permutation operations to be performed before other operations by removing permutation instructions from the instruction stream and instead having the SPU controller schedule the rearrangement instructions.…”
Section: Related Workmentioning
confidence: 99%
“…Motorola's AltiVec [7] includes a three operand instruction (vperm) for data rearrangement. Oliver et al [16] propose to include a subword permutation unit (SPU) in the execution pipeline. This SPU allows data permutation operations to be performed before other operations by removing permutation instructions from the instruction stream and instead having the SPU controller schedule the rearrangement instructions.…”
Section: Related Workmentioning
confidence: 99%
“…Matrix Multiply 16 + (39 + 48 · N/2) · M/16 16 + (39 + 48 · N/2) · M/16 5 + (20 + 18 · N/4) · M/4 Input data less than or equal 16-bit Vector/Matrix Multiply 16 + (39 + 48 · N/2) · M/16 5 + (10 + 15 · N/2) · M/8 5 + (47 + 39 · N/8) · M/8 input data is less than or equal 12-bit Matrix/Matrix Multiply…”
mentioning
confidence: 99%