Customizing CPU Instructions for Embedded Vision Systems

Piskorski,; Lacassagne,; Bouaziz,; Etiemble,

doi:10.1109/camp.2007.4350352

Cited by 11 publications

(5 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…All the components are implemented in VHDL and without DSP to increase clock working frequency. Half and single precision floating point numeric formats are used in the same way as in [19] and all the units are built with the help of the FloPoCo library [20].…”

Section: Results Of Implementationmentioning

confidence: 99%

FPGA Acceleration of the Horn and Schunck Hierarchical Algorithm

Bournias

Chotin

Lacassagne

2021

2021 IEEE International Symposium on Circuits and Systems (ISCAS)

Self Cite

View full text Add to dashboard Cite

This work proposes a highly tunable motion estimation architecture. We implement the Horn and Schunck algorithm with the hierarchical extension for larger motion estimations in FPGAs. Different architectures are explored dealing with interpolation, pipeline, parallelism and arithmetic format, in order to fit performance. We show in our exploration, how the different cores of our system should be used to increase the throughput. Our smallest design achieves a 30.8 Mpixel/s in a 1024x1024 resolution and the fastest 507 Mpixel/s which is one of the fastest ever achieved, as far as we know, for FPGAs.

show abstract

Section: Results Of Implementationmentioning

confidence: 99%

FPGA Acceleration of the Horn and Schunck Hierarchical Algorithm

Bournias

Chotin

Lacassagne

2021

2021 IEEE International Symposium on Circuits and Systems (ISCAS)

Self Cite

View full text Add to dashboard Cite

show abstract

“…global and shared_fusion are declined in both 32-bit (F 32 ) and 16-bit (F 16 ) floating point versions. As shown in [28] the use of F 16 is sufficient for optical flow. Furthermore, the original implementation in [10] already uses F 16 for parts of the algorithm on GPU.…”

Section: Cuda Gpu Optimisationsmentioning

confidence: 99%

Implementations Impact on Iterative Image Processing for Embedded GPU

Romera

Petreto

Lemaître

et al. 2021

2021 29th European Signal Processing Conference (EUSIPCO)

Self Cite

View full text Add to dashboard Cite

The emergence of low-power embedded Graphical Processing Units (GPUs) with high computation capabilities has enabled the integration of image processing chains in a wide variety of embedded systems. Various optimisation techniques are however needed in order to get the most out of an embedded GPU. This paper explores several optimisation methods for iterative stencil-like image processing algorithms on embedded NVIDIA GPUs using the Compute Unified Device Architecture (CUDA) API. We chose to focus our architectural optimisations on the TV-L1 algorithm, an optical flow estimation method based on total variation (TV) regularisation and the L1 norm. It is widely used as a model for more complex optical flow estimations and is used in many recent video processing applications. In this work we evaluate the impact of architecture-oriented optimisations on both execution time and energy consumption on several Nvidia Jetson GPU embedded boards. Results show a speedup up to 3× compared to State-of-the-Art versions as well as a 2.6× decrease in energy consumption.

show abstract

“…Adding such an instruction has been studied into [29]. [37] for specific domain application [38] but also new dedicated blocks. With a compiler like C2H for Altera FPGA or DIME-C for Xilinx, a complete C function can be compiled into a VHDL block and be directly called inside a C code.…”

Section: Swar Enhancementmentioning

confidence: 99%

“…Such hardware implementation can be much more faster than the sequential execution of the instructions that compose it, as no more ''register to register'' stage is required at each cycle like it is the case for pipeline execution. One of the best example of processor customization (not softcore but ASIP) is the Tensilica Xtensa architecture [37].…”

Section: Swar Enhancementmentioning

confidence: 99%

High performance motion detection: some trends toward new embedded architectures for vision systems

Lacassagne

Manzanera

Denoulet

et al. 2008

J Real-Time Image Proc

Self Cite

View full text Add to dashboard Cite

The goal of this article is to compare some optimised implementations on current high performance platforms in order to highlight architectural trends in the field of embedded architectures and to get an estimation of what should be the components of a next generation vision system. We present some implementations of robust motion detection algorithms on three architectures: a general purpose RISC processor-the PowerPC G4-a parallel artificial retina dedicated to low level image processingPvlsar34-and the Associative Mesh, a specialized architecture based on associative net. To handle the different aspects and constraints of embedded systems, execution time and power consumption of these architectures are compared.

show abstract

Customizing CPU Instructions for Embedded Vision Systems

Cited by 11 publications

References 9 publications

FPGA Acceleration of the Horn and Schunck Hierarchical Algorithm

FPGA Acceleration of the Horn and Schunck Hierarchical Algorithm

Implementations Impact on Iterative Image Processing for Embedded GPU

High performance motion detection: some trends toward new embedded architectures for vision systems

Contact Info

Product

Resources

About