Mapping of two-dimensional convolution on very long instruction word media processors for real-time performance

Managuli, Ravi; York, George; Kim, Donglok; Kim, Yongmin

doi:10.1117/1.482755

Cited by 18 publications

(20 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While the core processor computes on the data in the current cache block, the DMA controller stores the previously computed block into external memory and brings the next block to be processed from external memory to on-chip memory, thus overlapping the time required to move the data with the computing time. Details of using the DMA controller can be found in [14]. This example illustrates the computational advantage of using boxcar kernels compared to using generalized kernels, especially for the case requiring large kernels/low cutoff frequencies.…”

Section: Emphasis Gain Controlmentioning

confidence: 96%

“…For each point in the output image, a total of M 2 -1 additions must be performed. An efficient moving average method [14,19] …”

Section: Two-dimensional Boxcar Convolutionmentioning

confidence: 99%

“…Example mediaprocessors are TTI TriMedia, Texas Instruments TMS320C64x, Hitachi/Equator Technologies MAP-CA, and Intel Pentium 4. By optimally mapping the algorithm to the underlying processor architecture, high performance can be achieved [14,15]. In addition to high performance, these mediaprocessors can be programmed in C with intrinsics and provide smart compilers for higher software development productivity.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

<title>Fast unsharp masking on a programmable mediaprocessor</title>

Bae

Managuli

Shamdasani

et al. 2002

Medical Imaging 2002: Visualization, Image-Guided Procedures, and Display

Self Cite

View full text Add to dashboard Cite

Unsharp masking is a widely used image enhancement method in medical imaging, e.g., in computed radiography, digital radiography, and digital mammography. It mainly consists of 3 steps: (1) convolving an input image with a lowpass filter, (2) obtaining a highpass-filtered image by subtracting the lowpass-filtered image from the original image, and (3) adding the weighted highpass-filtered image to the original image. It is computationally expensive, e.g., convolving a 2k x 2k image with a 21 x 21 kernel alone requires about 3.7 billion arithmetic operations. To support this high computational demand for unsharp masking, hardwarebased solutions using ASIC, FPGA and FPLD have been developed and used. While they have reasonably met the computing requirement, they suffer from limited flexibility. On the other hand, software solutions using programmable processors are more flexible and can easily change algorithmic parameters, such as filter kernel size, and incorporate new features, but they have not been able to meet the fast computing requirement. Modern programmable mediaprocessors, such as MAP-CA and Texas Instruments TMS320C64x, can meet both fast computing and flexibility requirements due to their high computing power and full programmability. In this paper, we present an efficient implementation of adaptive unsharp masking on a MAP-CA mediaprocessor. For a 2k x 2k 16-bit image, our adaptive unsharp masking operation with a 149 x 149 boxcar kernel takes only 300 ms. This fast unsharp masking not only reduces the overall processing time in imaging modalities, but also allows the operator to adjust the selected parameters interactively for optimal image quality. Our implementation on the MAP-CA can be easily extended to other high-performance mediaprocessors, such as TMS320C64x and Pentium 4.

show abstract

Section: Emphasis Gain Controlmentioning

confidence: 96%

“…For each point in the output image, a total of M 2 -1 additions must be performed. An efficient moving average method [14,19] …”

Section: Two-dimensional Boxcar Convolutionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

<title>Fast unsharp masking on a programmable mediaprocessor</title>

Bae

Managuli

Shamdasani

et al. 2002

Medical Imaging 2002: Visualization, Image-Guided Procedures, and Display

Self Cite

View full text Add to dashboard Cite

show abstract

“…The IGALU also supports powerful instructions for filtering operations used extensively in image and video applications. For example, a single inner-product instruction can perform eight 16-bit multiplications in parallel, summing the results into a 32-bit output (Managuli et al, 2000). The two clusters together are capable of executing 4 different instructions (e.g., 2 on IALUs and 2 on IGALUs) in each clock cycle.…”

Section: Implementation On a Mediaprocessormentioning

confidence: 99%

Real‐time video postprocessing for deblocking and deringing on mediaprocessors

Gao

Mermer

Kim

2003

Int J Imaging Syst Tech

View full text Add to dashboard Cite

Blocking and ringing are two major artifacts in highly compressed images and videos coded by block-based discrete cosine transform. Many existing deblocking and deringing algorithms are computationally very expensive and/or cannot produce satisfactory results at very low bit rates. We have developed an adaptive deblocking algorithm and a clustering-based deringing algorithm. These algorithms can smooth out the blocking and ringing artifacts while preserving the strong edges and texture areas. In addition, our postprocessing algorithms have low computational cost. We have implemented them on a commercially available, programmable processor. For an image size of 352 ϫ 288, our algorithms take only 3.17 ms, demonstrating the feasibility of real-time video postprocessing.

show abstract

“…Data flow programming in mediaprocessors is typically handled in two ways: programmable DMA (direct memory access) engines and cache prefetching [7]. Implementing data flow using a DMA engine involves software development around the core computation tight loop.…”

Section: Universal Data Flow Code Generatormentioning

confidence: 99%

<title>Media processor programming interface to increase the portability of media processor software</title>

et al. 2001

View full text Add to dashboard Cite

The architecture of mediaprocessors has become increasingly sophisticated to accommodate the need for more performance in processing various media data. However, due to the inability of mediaprocessor compilers to fully detect the parallelism available in a program and maximize the utilization of the mediaprocessor's on-chip resources, C intrinsics, which are hints to the compiler on which assembly instructions to use, have been employed to achieve better performance. Nonetheless, these intrinsics are mediaprocessor-dependent, thus limiting the portability of mediaprocessor software. To help increase the portability of mediaprocessor software, we have developed a Mediaprocessor Programming Interface (MPI), which translates one set of C intrinsics into another. In many cases, the translated code for the target mediaprocessor has similar performance to the code developed with native intrinsics. We believe that the MPI can facilitate the reuse of mediaprocessor software as well as the development of mediaprocessor-independent software.

show abstract

Mapping of two-dimensional convolution on very long instruction word media processors for real-time performance

Cited by 18 publications

References 11 publications

<title>Fast unsharp masking on a programmable mediaprocessor</title>

<title>Fast unsharp masking on a programmable mediaprocessor</title>

Real‐time video postprocessing for deblocking and deringing on mediaprocessors

<title>Media processor programming interface to increase the portability of media processor software</title>

Contact Info

Product

Resources

About