Cost
IntroductionConsumer multimedia devices such as DVD players and cameras are very cost-sensitive. The MPEG and JPEG type algorithms they use tend to have multiplications by constants, rather than by variables. In this paper, we focus on cost-effective architectural support for such multiplications. We propose using adders enhanced with pre-shifters to perform efficient constant multiplies. This reduces the cost, as we show in the paper that an integer multiplier takes about three times the latency and three to four times the area over our design of a delay/area efficient preshift_adder to perform the preshift_add instructions. However, it is not easy to find the shortest instruction sequence for each constant multiply. We show our methodology for achieving this using a Directed Acyclic Graph (DAG) approach, to generate the shortest or nearly shortest sequence of instructions for every constant multiplier up to 15 bits. These optimal instruction sequences can be substituted by a programmer or compiler when a multiply by a constant is needed. Our performance results show that we can improve the performance while reducing the cost of constant multiplications.We use four fixed-point cases to evaluate the instruction sequences that our algorithm generates. We use CI.F to denote that the positive constant multiplier has I bits of integer and F bits of fraction. The four cases are C8.0, C12.0, C2.10 and C3.12. While we are more interested in cases like C2.10 and C3.12 for our multimedia applications, where constants tend to be fractions with few integer bits, we generate C8.0 and C12.0 sequences, to compare our results with earlier work in [4] on constant multiplication by integers.Sections 2 and 3 describe our DAG-based search algorithm for finding the shortest instruction sequence for C8.0 case and the nearly shortest instruction sequences for C12.0, C2.10 and C3.12 cases. Section 4 presents our performance results, and comparisons to earlier work [4]. Section 5 presents our design of a preshift_adder. Based on the results from section 4 and 5, we discuss the performance/area gain we achieve on an optimized DCT/IDCT algorithm in section 6.
2.Algorithm overview 2