Design of an H.264/AVC Decoder with Memory Hierarchy and Line-Pixel-Lookahead

Liu, Tsu-Ming; Lee, Chen‐Yi

doi:10.1007/s11265-007-0115-0

Cited by 4 publications

(3 citation statements)

References 12 publications

(37 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This way, the fixed architecture's datapath was tuned to operate with maximum parallelism considering only full sampling. A similar approach is presented in [42], where specific fixed architectures are presented for sampling ratios other than 1:1.…”

Section: Sad Configurable Architecturementioning

confidence: 99%

Analysis of Pel Decimation and Technology Choices to Reduce Energy on SAD Calculation

Seidel¹,

Bräscher²,

Moraes³

et al. 2020

JICS

View full text Add to dashboard Cite

As the number of pixels per frame tends to increase in new high definition video coding standards such as HEVC and VP9, pel decimation appears as a viable means of increasing the energy efficiency of Sum of Absolute Differences (SAD) calculation. First, we analyze the quality costs of pel decimation using a video coding software. Then we present and evaluate two VLSI architectures to compute the SAD of 4x4 pixel blocks: one that can be configured with 1:1, 2:1 or 4:1 sampling ratios and a non-configurable one, to serve as baseline in comparisons. The architectures were synthesized for 90nm, 65nm and 45nm standard cell libraries assuming both nominal and Low-Vdd/High-Vt (LH) cases for maximum and for a given target throughput. The impacts of both subsampling and LH on delay, power and energy efficiency are analyzed. In a total of 24 syntheses, the 45nm/LH configurable SAD architecture synthesis achieved the highest energy efficiency for target throughput when operating in pel decimation 4:1, spending only 2.05pJ for each 4×4 block. This corresponds to about 13.65 times less energy than the 90nm/nominal configurable architecture operating in full sampling mode and maximum throughput and about 14.77 times less than the 90nm/nominal non-configurable synthesis for target throughput. Aside the improvements achieved by using LH, pel decimation solely was responsible for energy reductions of 40% and 60% when choosing 2:1 and 4:1 subsampling ratios, respectively, in the configurable architecture. Finally, it is shown that the configurable architecture is more energy-efficient than the non-configurable one.

show abstract

Section: Sad Configurable Architecturementioning

confidence: 99%

Analysis of Pel Decimation and Technology Choices to Reduce Energy on SAD Calculation

Seidel¹,

Bräscher²,

Moraes³

et al. 2020

JICS

View full text Add to dashboard Cite

show abstract

“…The critical design decision is to determine the size of the on-chip memory and decide which part of reference data should be stored. Design [25] proposed a Line-Pixel-Lookahead scenario to predict the size of the buffer.…”

Section: Poc (Partial-on-chip)mentioning

confidence: 99%

“…2) For PoC scheme, we assume 1/8 memory size compared with FoC, and 60% missing rate according to [25].…”

Section: Poc (Partial-on-chip)mentioning

confidence: 99%

Methods for Power/Throughput/Area Optimization of H.264/AVC Decoding

Liu

Guo

et al. 2009

J Sign Process Syst Sign Image Video Technol

Self Cite

View full text Add to dashboard Cite

This paper presents methods for efficient optimization of ASIC implementation for H.264/AVC video decoding. A systematic approach in optimization is presented in a top-down flow. Tradeoffs among Power, Throughput, and Area (PTA) at both system level and block level are studied and balanced. The system architecture is first evaluated. We then focus on the pipeline organization, parallelism, and memory architecture optimization. Different pipeline granularities are compared and their pros-and-cons are evaluated. Various parallel scenarios, especially 1×4-column and 4×1-row, are analyzed and compared. Then the detailed designs of various building blocks, such as inverse transform, inter prediction, and deblocking filter, are evaluated and their intrinsic characteristics are exploited to facilitate PTA optimization. Finally, we provide the design guidelines for ASIC implementation based on the analysis and our design experiences of five dedicated decoder chips.

show abstract