Cor Meenderinck scite author profile

An important question is whether emerging and future applications exhibit sufficient parallelism, in particular thread-level parallelism, to exploit the large numbers of cores future chip multiprocessors (CMPs) are expected to contain. As a case study we investigate the parallelism available in video decoders, an important application domain now and in the future. Specifically, we analyze the parallel scalability of the H.264 decoding process. First we discuss the data structures and dependencies of H.264 and show what types of parallelism it allows to be exploited. We also show that previously proposed parallelization strategies such as slice-level, frame-level, and intra-frame macroblock (MB) level parallelism, are not sufficiently scalable. Based on the observation that inter-frame dependencies have a limited spatial range we propose a new parallelization strategy, called Dynamic 3D-Wave. It allows certain MBs of consecutive frames to be decoded in parallel. Using this new strategy we analyze the limits to the available MB-level parallelism in H.264. Using real movie sequences we find a maximum MB parallelism ranging from 4000 to 7000. We also perform a case study to assess the practical value and possibilities of a highly parallelized H.264 application. The results show that H.264 exhibits sufficient parallelism to efficiently exploit the capabilities of future manycore CMPs.

show abstract

The SARC Architecture

Ramírez

Cabarcas

Juurlink

et al. 2010

IEEE Micro

View full text Add to dashboard Cite

The SARC architecture is composed of multiple processor types and a set of user-managed direct memory access (DMA) engines that let the runtime scheduler overlap data transfer and computation. The runtime system automatically allocates tasks on the heterogeneous cores and schedules the data transfers through the DMA engines. SARC's programming model supports various highly parallel applications, with matching support from specialized accelerator processors. On-chip parallel computation shows great promise for scaling raw processing performance within a given power budget. However, chip multiprocessors (CMPs) often struggle with programmability and scalability issues such as cache coherency and off-chip memory bandwidth and latency.

show abstract

Parallel H.264 Decoding on an Embedded Multicore Processor

Azevedo

Meenderinck

Juurlink

et al. 2009

View full text Add to dashboard Cite

A Case for Hardware Task Management Support for the StarSS Programming Model

Meenderinck

Juurlink

2010

View full text Add to dashboard Cite

A c a s e f o r h a r dw a r e t a s k m a n a g em e n t s u p p o r t f o r t h e S t a r S S p r o g r amm i n g m o d e l C on f e r en c e ob j e c t , P o s tp r in t v e r s i on T h i s v e r s i o n i s a v a i l a b l e a t h t t p : / / d x . d o i . o r g / 1 0 . 1 4 2 7 9 / d e p o s i t o n c e -5 7 7 6 . Sugg e s t ed C i t a t i on M e e n d e r i n c k , C o r ; J u u r l i n k , B e n : A c a s e f o r h a r dw a r e t a s k m a n a g em e n t s u p p o r t f o r t h e S t a r S S p r o g r amm i n g m o d e l . -I n : 2 0 1 0 1 3 t h E u r om i c r o C o n f e r e n c e o n D i g i t a l S y s t em D e s i g n : A r c h i t e c t u r e s , T e rm s o f U s e © © 2 0 1 0 I E E E . P e r s o n a l u s e o f t h i s m a t e r i a l i s p e rm i t t e d . P e rm i s s i o n f r om I E E E m u s t b e o b t a i n e d f o r a l l o t h e r u s e s , i n a n y c u r r e n t o r f u t u r e m e d i a , i n c l u d i n g r e p r i n t i n g / r e p u b l i s h i n g t h i s m a t e r i a l f o r a d v e r t i s i n g o r p r om o t i o n a l p u r p o s e s , c r e a t i n g n ew c o l l e c t i v e w o r k s , f o r r e s a l e o r r e d i s t r i b u t i o n t o s e r v e r s o r l i s t s , o r r e u s e o f a n y c o p y r i g h t e d c om p o n e n t o f t h i s w o r k i n o t h e r w o r k s .

show abstract

Scalability of Macroblock-level Parallelism for H.264 Decoding

Mesa

Ramírez

Azevedo

et al. 2009

View full text Add to dashboard Cite

Abstract-This paper investigates the scalability of MacroBlock (MB) level parallelization of the H.264 decoder for High Definition (HD) applications. The study includes three parts. First, a formal model for predicting the maximum performance that can be obtained taking into account variable processing time of tasks and thread synchronization overhead. Second, an implementation on a real multiprocessor architecture including a comparison of different scheduling strategies and a profiling analysis for identifying the performance bottlenecks. Finally, a trace-driven simulation methodology has been used for identifying the opportunities of acceleration for removing the main bottlenecks. It includes the acceleration potential for the entropy decoding stage and thread synchronization and scheduling. Our study presents a quantitative analysis of the main bottlenecks of the application and estimates the acceleration levels that are required to make the MB-level parallel decoder scalable.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Cor Meenderinck

Parallel Scalability of Video Decoders

The SARC Architecture

Parallel H.264 Decoding on an Embedded Multicore Processor

A Case for Hardware Task Management Support for the StarSS Programming Model

Scalability of Macroblock-level Parallelism for H.264 Decoding

Contact Info

Product

Resources

About