Hardware design for the 32&amp;#x00D7;32 IDCT of the HEVC video coding standard

Conceição, Ruhan; Souza, José Cláudio de; Jeske, Ricardo; Porto, Marcelo; Mattos, Júlio C. B.; Agostini, Luciano

doi:10.1109/sbcci.2013.6644881

Cited by 5 publications

(6 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The DCT implementation proposed by Zhao et al [6] consumes 40541 logic elements. The IDCT implementation developed by Conceição et al [18] requires 28311 ALUTs and 15367 registers (implemented in separate ALUTs). Although carry-chains are highly optimised, the wiring between successive layers of adders/ subtractors introduces significant latencies.…”

Section: Design Comparisonmentioning

confidence: 99%

“…Although carry-chains are highly optimised, the wiring between successive layers of adders/ subtractors introduces significant latencies. Hence, the clock frequency is decreased dramatically (43.6 MHz) [18]. The pipelining can increase the frequency (150 MHz) [19].…”

Section: Design Comparisonmentioning

confidence: 99%

“…Hardware encoders apply a number of parallelisation techniques to satisfy real-time requirements. In the literature, there are some architectures developed to accelerate the DCT/IDCT (inverse DCT) computation for H.265/HEVC [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]. Some of them do not support all transform sizes [10-12, 14, 18, 20].…”

Section: Introductionmentioning

confidence: 99%

“…In the literature, there are some architectures developed to accelerate the DCT/IDCT (inverse DCT) computation for H.265/HEVC [6–20]. Some of them do not support all transform sizes [10–12, 14, 18, 20]. All the designs embed (or assume [10, 17]) the full‐size transposition buffer (32 × 32 samples) implemented either as a register matrix [6–9, 11, 16, 18] or memory modules [8, 13, 15, 19, 20].…”

Section: Introductionmentioning

confidence: 99%

“…Some of them do not support all transform sizes [10–12, 14, 18, 20]. All the designs embed (or assume [10, 17]) the full‐size transposition buffer (32 × 32 samples) implemented either as a register matrix [6–9, 11, 16, 18] or memory modules [8, 13, 15, 19, 20]. In the either case, the buffer contributes a significant amount of hardware resources.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Hardware architectures for the H.265/HEVC discrete cosine transform

Pastuszak

2015

IET Image Processing

View full text Add to dashboard Cite

This study presents a design methodology for the two-dimensional (2D) discrete cosine transform dedicated for H.265/ HEVC hardware encoders. The methodology decomposes matrix multiplications for different transform sizes into some steps based on the division of transform units into fixed-size blocks. The modified order of processed blocks allows a significant reduction of the size of the transposition buffer. As a consequence, the resource consumption of the whole 2D-transform architecture is decreased. Separate transform cores assigned to two transform stages increase the throughput more than twice. The decomposition enables different hardware configurations of the architectures. Particularly, the architectures applying the proposed methodology are parametrically specified in VHDL, and configuration parameters enable the tradeoff between resources and the throughput. Furthermore, the interface adaptation to desired horizontal and vertical sizes is possible. The use of regular multipliers allows the support for transforms specified in other video standards. Computational elements embedded in architectures are well-suited to FPGA devices, which improves the area-speed efficiency. Synthesis results show that they can operate at 200 and 400 MHz when implemented in FPGA Arria II and TSMC 90 nm, respectively.

show abstract

Section: Design Comparisonmentioning

confidence: 99%

Section: Design Comparisonmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Hardware architectures for the H.265/HEVC discrete cosine transform

Pastuszak

2015

IET Image Processing

View full text Add to dashboard Cite

show abstract

Exploring the concurrent execution of HEVC intra encoding algorithms for heterogeneous multi core architectures

Brandenburg

Stabernack

2015

2015 Conference on Design and Architectures for Signal and Image Processing (DASIP)

View full text Add to dashboard Cite

By introducing novel algorithms in the emerging high efficiency video coding (HEVC) standard average bitrate savings of 23% have been achieved in comparison to the H.264/AVC high profile reference encoder for the all intra configuration at the cost of an additional complexity increase. This high algorithmic complexity requires the development and integration of innovative approaches, like different parallelization strategies, heterogeneous designs and/or algorithmic optimizations, resulting in a complex design space. Early design verifications as well as performance evaluations are crucial to realize successful solutions, which meet the functional and non-functional constrains respectively. In this paper we propose a SystemC based heterogeneous multi-core model of an HEVC intra encoder, which is used to explore different design aspects and alternatives. Due to its cycle accurate nature the model is well suited to facilitate various performance evaluations and to drive H W/SW co-optimizations of the explored system, as we will discuss in this paper

show abstract