Low level source code optimizing for single/multi/core digital signal processors

Fryza, Tomáš; Mego, Roman

doi:10.1109/radioelek.2013.6530933

Cited by 7 publications

(5 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For illustration, the achievable results are presented by comprehensible 4-point FFT radix-2 with time decimated complex input [23]. Thanks to the optimizations from [24], the 4-point version contains only addition and subtraction operations. The part of algorithm description without signal definitions is shown in Listing 1 and for better understanding, it can be visualized by the generated DOT file [25] (see Fig.…”

Section: Basic Behaviormentioning

confidence: 99%

Instruction mapping techniques for processors with very long instruction word architectures

Mego

Fryza

2022

Journal of Electrical Engineering

View full text Add to dashboard Cite

This paper presents an instruction mapping technique for generating a low-level assembly code for digital signal processing algorithms. This technique helps developers to implement retargetable kernel functions with the performance benefits of the low-level assembly languages. The approach is aimed at exceptionally long instruction word (VLIW) architectures, which benefits the most from the proposed method. Mapped algorithms are described by the signal-flow graphs, which are used to find possible parallel operations. The algorithm is converted into low-level code and mapped to the target architecture. This process also introduces the optimization of instruction mapping priority, which leads to the more effective code. The technique was verified on selected kernels, compared to the common programming methods, and proved that it is suitable for VLIW architectures and for portability to other systems.

show abstract

Section: Basic Behaviormentioning

confidence: 99%

Instruction mapping techniques for processors with very long instruction word architectures

Mego

Fryza

2022

Journal of Electrical Engineering

View full text Add to dashboard Cite

show abstract

“…The code for the floating-point data type does not do that, because the floating-point operations take more instruction cycles for its execution. For the comparison with the hand optimized code from [4], the hand optimized 4-point FFT with single precision complex input takes 24 instruction cycles and the average unit load is about 30%. The hand optimized 8-point FFT takes 42 instruction cycles and the unit load is about 55%.…”

Section: -Point Fftmentioning

confidence: 99%

“…There are also available frameworks with the increased efficiency [3] that can work with VLIW architectures as well. The difference between the hand optimized code and the compiled code from the high-level language can be still significant, when the signal processing algorithm is implemented on any VLIW processor [4].…”

Section: Introductionmentioning

confidence: 99%

Efficiency of the signal processing algorithms using signal-flow based mapping tool

Mego

Fryza

2015

2015 25th International Conference Radioelektronika (RADIOELEKTRONIKA)

View full text Add to dashboard Cite

This paper is dealing with the implementation of the signal processing algorithms, specifically the Fast Fourier Transform and the matrix multiplication, using the new tool for mapping instructions on functional units of the processor. The tool is using the signal-flow based description of the algorithm instead of the sequential notation of the program execution. The selected target processor is a multi-core digital signal processor based on the very long instruction word architecture. The final assembly code is analyzed in terms of utilization of the functional units and general purpose registers.

show abstract

“…Although the implementation of high efficiency, a little unreasonable program design would lead to a sharp decline in performance. Richard Prokesch [4] concludes that if runtime predictability is important, that manual parallelization should be used because of managing the worker cores is controlled by the developer.…”

Section: Openmpmentioning

confidence: 99%

Parallel Programming and Optimization Based on TMS320C6678

Mou

Wei

Zhang

2014

AMM

View full text Add to dashboard Cite

The development of multi-core processors has provided a good solution to applications that require real-time processing and a large number of calculations. However, simply exploiting parallelism in software is hard to make full use of the hardware performance. This paper studies the parallel programming and optimization techniques on TMS320C6678 multicore digital signal processors. We firstly illustrate an implementation of a selected parallel image convolution algorithm by OpenMP. Then several optimization techniques such as compiler intrinsics, cache, DMA are used to further enhance the application performance and achieve a good execution time according to the test results.

show abstract

Low level source code optimizing for single/multi/core digital signal processors

Cited by 7 publications

References 2 publications

Instruction mapping techniques for processors with very long instruction word architectures

Instruction mapping techniques for processors with very long instruction word architectures

Efficiency of the signal processing algorithms using signal-flow based mapping tool

Parallel Programming and Optimization Based on TMS320C6678

Contact Info

Product

Resources

About