Wajahat Qadeer scite author profile

Due to their high volume, general-purpose processors, and now chip multiprocessors (CMPs), are much more cost effective than ASICs, but lag significantly in terms of performance and energy efficiency. This paper explores the sources of these performance and energy overheads in general-purpose processing systems by quantifying the overheads of a 720p HD H.264 encoder running on a general-purpose CMP system. It then explores methods to eliminate these overheads by transforming the CPU into a specialized system for H.264 encoding. We evaluate the gains from customizations useful to broad classes of algorithms, such as SIMD units, as well as those specific to particular computation, such as customized storage and functional units.The ASIC is 500x more energy efficient than our original fourprocessor CMP. Broadly, applicable optimizations improve performance by 10x and energy by 7x. However, the very low energy costs of actual core ops (100s fJ in 90nm) mean that over 90% of the energy used in these solutions is still "overhead". Achieving ASIC-like performance and efficiency requires algorithm-specific optimizations. For each sub-algorithm of H.264, we create a large, specialized functional unit that is capable of executing 100s of operations per instruction. This improves performance and energy by an additional 25x and the final customized CMP matches an ASIC solution's performance within 3x of its energy and within comparable area.

show abstract

Rethinking Digital Design: Why Design Must Change

Shacham

Azizi

Wachs

et al. 2010

IEEE Micro

View full text Add to dashboard Cite

Heterogeneous Wireless Network Management

Qadeer

Rosing

Ankcorn

et al. 2005

View full text Add to dashboard Cite

Understanding sources of ineffciency in general-purpose chips

et al. 2011

View full text Add to dashboard Cite

o C To b E r 2 0 1 1 | vo L . 5 4 | N o. 1 0 | c o m m u n i c at i o n s o f t he acm 85 abstractScaling the performance of a power limited processor requires decreasing the energy expended per instruction executed, since energy/op * op/second is power. To better understand what improvement in processor efficiency is possible, and what must be done to capture it, we quantify the sources of the performance and energy overheads of a 720p HD H.264 encoder running on a general-purpose fourprocessor CMP system. The initial overheads are large: the CMP was 500× less energy efficient than an Application Specific Integrated Circuit (ASIC) doing the same job. We explore methods to eliminate these overheads by transforming the CPU into a specialized system for H.264 encoding. Broadly applicable optimizations like single instruction, multiple data (SIMD) units improve CMP performance by 14× and energy by 10×, which is still 50× worse than an ASIC. The problem is that the basic operation costs in H.264 are so small that even with a SIMD unit doing over 10 ops per cycle, 90% of the energy is still overhead. Achieving ASIClike performance and efficiency requires algorithm-specific optimizations. For each subalgorithm of H.264, we create a large, specialized functional/storage unit capable of executing hundreds of operations per instruction. This improves energy efficiency by 160× (instead of 10×), and the final customized CMP reaches the same performance and within 3× of an ASIC solution's energy in comparable area.

show abstract

Using a configurable processor generator for computer architecture prototyping

Solomatnikov

Firoozshahian

Shacham

et al. 2009

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wajahat Qadeer

Understanding sources of inefficiency in general-purpose chips

Rethinking Digital Design: Why Design Must Change

Heterogeneous Wireless Network Management

Understanding sources of ineffciency in general-purpose chips

Using a configurable processor generator for computer architecture prototyping

Contact Info

Product

Resources

About