Towards optimal use of pel decimation to trade off quality for energy

Seidel, Ismael; Bräscher, André Beims; Monteiro, Maurílio de Abreu; Güntzel, José Luís

doi:10.1007/s10470-015-0575-2

Cited by 1 publication

(7 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This difference can be decreased to ∼ 19.8% after applying clock gating. However, after simulation, the energy figures are almost identical between this work SSD and the SAD from [15]: while SAD uses 8.34 pJ/block, SSD requires only 8.30 pJ/block. One possible explanation is the use of two different versions of the synthesis tool.…”

Section: A Resultsmentioning

confidence: 66%

“…The smallest reduction was for std (∼ 9.8%), while the largest was for and (∼ 14.4%), thus reducing their area difference to ∼ 17.4%. The SSD architecture using std squarer requires an area ∼ 97% larger than the SAD architecture presented in [15], synthesized for the same standard cell library and the same target throughput. Also, once the SAD architecture FSM from [15] is the same of the presented in this work (Figure 1(b)), the same frequency was used.…”

Section: A Resultsmentioning

confidence: 99%

“…The SSD architecture using std squarer requires an area ∼ 97% larger than the SAD architecture presented in [15], synthesized for the same standard cell library and the same target throughput. Also, once the SAD architecture FSM from [15] is the same of the presented in this work (Figure 1(b)), the same frequency was used. The upper quadrants of Table I present power and energy/SSD 4×4 estimates obtained after the syntheses: the leftmost contains results of the syntheses without clock gating, while the rightmost presents the results where clock gating was applied.…”

Section: A Resultsmentioning

confidence: 99%

“…According to the synthesis results prior to simulation, the std-SSD requires ∼ 50.1% higher energy than the SAD from [15]. This difference can be decreased to ∼ 19.8% after applying clock gating.…”

Section: A Resultsmentioning

confidence: 99%

“…We assumed a target throughput of 16 million 4×4 blocks/s which is the same assumed in [15], necessary to encode full HD content (1920 × 1080) at 30 frames per second [16]. Therefore, the target period was set to 1.84 ns (∼ 543.47…”

Section: A Squarer Designsmentioning

confidence: 99%

See 4 more Smart Citations

Squarer exploration for energy-efficient sum of squared differences

Seidel

Monteiro

Güntzel

et al. 2016

2016 IEEE 7th Latin American Symposium on Circuits &Amp; Systems (LASCAS)

Self Cite

View full text Add to dashboard Cite

The main reason for the long time and high energy requirements of state-of-the-art Video Coding (VC) standards, such as the HEVC, is the large amount of distortion calculations. Among the most known and used ones is the Sum of Squared Differences (SSD) which has a strong correlation with the Peak Signal-to-Noise Ratio (PSNR). Such correlation is explored by current encoders to provide a good trade-off between rate and distortion. Once VC is mandatory in current battery-powered devices, the adopted distortion metric must be as energy-efficient as possible. Although simple, the SSD requires a square operation, which hardware realization is costly. Thus, some VC hardware designs replace the SSD by the Sum of Absolute Differences (SAD). However, using SAD instead of SSD pays a price in coding efficiency. In this work we investigate four hardware designs for the square operation. Synthesis results for the designed architectures are compared to a reference SAD design from the literature. The best SSD architecture, using clock gating, requires only 20% more energy than SAD.

show abstract

Section: A Resultsmentioning

confidence: 66%

Section: A Resultsmentioning

confidence: 99%