Video algorithm (e.g. H.264, MPEG2/4 etc) requires tremendous amount of computation power and data bandwidth. This complexity depends on encoding vs. decoding mode, video standard, resolution, frame-rate and visual quality constraints. Many video architecture solutions typically use multiple processing elements (e.g. multiple DSPs or MCU, DSP/MCU with dedicated accelerators or FPGA etc) to achieve the high computation requirements for video algorithms. These architectures provide new challenges to video software's that are typically designed to run on a single processor. This paper presents software design for a video architecture using parallel processing elements. This paper explains following aspects in detail a) Software partitioning b) Algorithm specific optimizations c) Processor specific optimizations d) Efficient DMA/Cache usage e) Concurrent scheduling of all parallel processing elements. The given approach is explained with example of MPEG4 encoder on TMS320DM6446, which is Davinci TM family device from Texas Instruments Ltd. The given software architecture is scalable for various video standards (e.g. H.264, MPEG2/4 etc) as well as various parallel processing hardware solutions. The software achieves performance D1@30fsp on given device at less than 50% of DSP load.Index terms-architecture, video coding, parallel processing elements, MPEG4, H.264, Davinci TM .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.