A 300-MHz 16-b BiCMOS video signal processor

Inoue, Toshiaki; Goto, J.; Yamashina, M.; Suzuki, Kazumasa; Nomura, Masahiro; Koseki, Yoshiyuki; Kimura, Tsukasa; Atsumo, T.; Motomura, Masato; Shih, B.S.; Horinchi, T.; Hamatake, N.; Kumagai, K.; Enomoto, Tadayoshi; Yamada, H.; Takada, M.

doi:10.1109/4.262006

Cited by 13 publications

(2 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…on adapted processor modules. 1) Increase of Clock Frequency: The S-VSP [5] and the VSP3 [6] from NEC are typical of video signal processors based on an enhanced DSP core. Their performances are based on the use of a high clock frequency allowing intensive pipelining.…”

Section: B Video Signal Processorsmentioning

confidence: 99%

Toward hardware building blocks for software-only real-time video processing: the MOVIE approach

Charot

Fol

Lemonnier

et al. 1999

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

The goal of the movie very large-scale integration chip is to facilitate the development of software-only solutions for real-time video processing applications. This chip can be seen as a building block for single-instruction, multiple-data processing, and its architecture has been designed so as to facilitate high-level language programming. The basic architecture building block associates a subarray of computation processors with an I/O processor. A module can be seen as a small linear, systolic-like array of processing elements, connected at each end to the I/O processor. The module can communicate with its two nearest neighbors via two communication ports. The chip architecture also includes three 16-bit video ports. One important aspect in the programming environment is the C-stolic programming language. C-stolic is a C-like language augmented with parallel constructs, which allow the differentiation between the array controller variables (scalar variables) and the local variables in the array structure (systolic variables). A statement operating on systolic variables implies a simultaneous execution on all the cells of the structure. Implementation examples of movie-based architectures dealing with video compression algorithms are given.Index Terms-Code generation, single-instruction, multipledata (SIMD) architecture, systolic architecture, very large-scale integration (VLSI) circuit, video compression, video processing.

show abstract

Section: B Video Signal Processorsmentioning

confidence: 99%

Toward hardware building blocks for software-only real-time video processing: the MOVIE approach

Charot

Fol

Lemonnier

et al. 1999

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

show abstract

“…We use the data from a 300-MHz 16-bit video processor built in 1993 at NEC with 0.5 µm BiCMOS technology [8][9][10] for an estimation [7] of the area and delay of a multiplier verses a preshift_adder. The estimated data are listed in table 4.…”

Section: Area and Delay Estimation And Comparisonmentioning

confidence: 99%

Cost-effective multiplication with enhanced adders for multimedia applications

Luo

Lee²

2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No

View full text Add to dashboard Cite

Cost IntroductionConsumer multimedia devices such as DVD players and cameras are very cost-sensitive. The MPEG and JPEG type algorithms they use tend to have multiplications by constants, rather than by variables. In this paper, we focus on cost-effective architectural support for such multiplications. We propose using adders enhanced with pre-shifters to perform efficient constant multiplies. This reduces the cost, as we show in the paper that an integer multiplier takes about three times the latency and three to four times the area over our design of a delay/area efficient preshift_adder to perform the preshift_add instructions. However, it is not easy to find the shortest instruction sequence for each constant multiply. We show our methodology for achieving this using a Directed Acyclic Graph (DAG) approach, to generate the shortest or nearly shortest sequence of instructions for every constant multiplier up to 15 bits. These optimal instruction sequences can be substituted by a programmer or compiler when a multiply by a constant is needed. Our performance results show that we can improve the performance while reducing the cost of constant multiplications.We use four fixed-point cases to evaluate the instruction sequences that our algorithm generates. We use CI.F to denote that the positive constant multiplier has I bits of integer and F bits of fraction. The four cases are C8.0, C12.0, C2.10 and C3.12. While we are more interested in cases like C2.10 and C3.12 for our multimedia applications, where constants tend to be fractions with few integer bits, we generate C8.0 and C12.0 sequences, to compare our results with earlier work in [4] on constant multiplication by integers.Sections 2 and 3 describe our DAG-based search algorithm for finding the shortest instruction sequence for C8.0 case and the nearly shortest instruction sequences for C12.0, C2.10 and C3.12 cases. Section 4 presents our performance results, and comparisons to earlier work [4]. Section 5 presents our design of a preshift_adder. Based on the results from section 4 and 5, we discuss the performance/area gain we achieve on an optimized DCT/IDCT algorithm in section 6. 2.Algorithm overview 2

show abstract

Smart Cameras and MPSoCs

Wolf

2021

Multi‐Processor System‐on‐Chip 2

View full text Add to dashboard Cite

A 300-MHz 16-b BiCMOS video signal processor

Cited by 13 publications

References 8 publications

Toward hardware building blocks for software-only real-time video processing: the MOVIE approach

Toward hardware building blocks for software-only real-time video processing: the MOVIE approach

Cost-effective multiplication with enhanced adders for multimedia applications

Smart Cameras and MPSoCs

Contact Info

Product

Resources

About