Andrea Petreto scite author profile

Andrea Petreto

4Publications

25Citation Statements Received

78Citation Statements Given

How they've been cited

How they cite others

Affiliations

Alcen (France), Université Paris Cité, University of Paris-Sud

Publications

Order By: Most citations

Energy and Execution Time Comparison of Optical Flow Algorithms on SIMD and GPU Architectures

Petreto¹,

Hennequin²,

Kœhler³

et al. 2018

View full text Add to dashboard Cite

This article presents and compares optimized implementations of two optical flow algorithms on several target boards comprising multi-core SIMD processors and GPUs. The two algorithms are Horn-Schunck (HS) and TV-L1, and have been chosen because they are both well-known, and because of their different computational complexity and accuracy. For both algorithms, we have made parallel optimized SIMD implementations, while HS has also been implemented on GPUs. For each algorithm, the comparison between the different versions and target boards is carried out in a two-dimensional fashion: in terms of computing speed-in order to achieve real-time computation-and in terms of energy consumption since we target embedded systems. The results show that for HS, the GPUs are the most efficient in both dimensions, able to process in realtime performances (25 frames per second) up to 8Mpix images for 0.35J per image, against 1.8Mpix images for 0.24J per image on CPU. The results also highlight the impact of optimizations on TV-L1: far slower than HS without optimization, it can almost match its performance after optimization on CPU, and can achieve real-time performances with 0.25J for 1.4Mpix images. We hope these results will help developers design optical flow embedded systems.

show abstract

A new SIMD iterative connected component labeling algorithm

Lacassagne

Cabaret

Etiemble

et al. 2016

View full text Add to dashboard Cite

International audienceThis paper presents a new multi-pass iterative algorithm for Connected Component Labeling. The performance of this algorithm is compared to those of State-of-the-Art two-pass direct algorithms. We show that thanks to the parallelism of the SIMD multi-core processors and an activity matrix that avoids useless memory access, such a kind of algorithm has performance that comes closer and closer to direct ones. This new tiled iterative algorithm has been benchmarked on four generations of Intel Xeon processors: 2×4-core Nehalem, 2×12-core Ivy-Bridge, 2×14-core Haswell and 57-core Knight Corner. Macro meta-programming was used to design a unique code for SSE, AVX2 and KNC SIMD instruction set

show abstract

A New Real-Time Embedded Video Denoising Algorithm

Petreto

Romera

Lemaître

et al. 2019

View full text Add to dashboard Cite

Many embedded applications rely on video processing or on video visualization. Noisy video is thus a major issue for such applications. However, video denoising requires a lot of computational effort and most of the state-of-the-art algorithms cannot be run in real-time at camera framerate. This article introduces a new real-time video denoising algorithm for embedded platforms called RTE-VD. We first compare its denoising capabilities with other online and offline algorithms. We show that RTE-VD can achieve real-time performance (25 frames per second) for qHD video (960×540 pixels) on embedded CPUs and the output image quality is comparable to state-of-theart algorithms. In order to reach real-time denoising, we applied several high-level transforms and optimizations (SIMDization, multi-core parallelization, operator fusion and pipelining). We study the relation between computation time and power consumption on several embedded CPUs and show that it is possible to determine different frequency and core configurations in order to minimize either the computation time or the energy.

show abstract

Implementations Impact on Iterative Image Processing for Embedded GPU

Romera

Petreto

Lemaître

et al. 2021

View full text Add to dashboard Cite

The emergence of low-power embedded Graphical Processing Units (GPUs) with high computation capabilities has enabled the integration of image processing chains in a wide variety of embedded systems. Various optimisation techniques are however needed in order to get the most out of an embedded GPU. This paper explores several optimisation methods for iterative stencil-like image processing algorithms on embedded NVIDIA GPUs using the Compute Unified Device Architecture (CUDA) API. We chose to focus our architectural optimisations on the TV-L1 algorithm, an optical flow estimation method based on total variation (TV) regularisation and the L1 norm. It is widely used as a model for more complex optical flow estimations and is used in many recent video processing applications. In this work we evaluate the impact of architecture-oriented optimisations on both execution time and energy consumption on several Nvidia Jetson GPU embedded boards. Results show a speedup up to 3× compared to State-of-the-Art versions as well as a 2.6× decrease in energy consumption.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Andrea Petreto

Energy and Execution Time Comparison of Optical Flow Algorithms on SIMD and GPU Architectures

A new SIMD iterative connected component labeling algorithm

A New Real-Time Embedded Video Denoising Algorithm

Implementations Impact on Iterative Image Processing for Embedded GPU

Contact Info

Product

Resources

About