Fast analysis of hardware/software trade-offs for cost, performance and power-constrained embedded systems is a key to reduce the time to market and at the same time improve the quality of results. However, this analysis must also be close to the final results of the detailed HW and SW implementation in order to lead to an optimal solution. This requires the use of compilation (for SW) and synthesis (for HW) techniques that ensure the existence of a solution with the estimated cost, and are not too far from what will later be achieved by manual optimization and detailed design.We start from a realistic application domain, namely soundtriggered wireless security cameras, and we show in detail how one can start from an algorithm modeled and validated using Simulink, and using commercial state-of-the-art tools explore various possible hardware and software implementations for the frequency based audio detection front end, with respect to the overall design constraints and goals. We show how rapid estimations of the various aspects of the cost function can be obtained quickly, using directly the C code generated from Simulink, with a few manual refinements in order to increase the efficiency of both software and hardware implementations and bring them closer to the final optimized implementation. We report results showing different points in the design space. The results that we obtained are close to manual hand optimized implementations for both HW and SW, showing that the approach is useful for trade-off analysis in a very short time, and that further manual optimizations can quickly lead to the best implementation.
An ever increasing use of digital video applications such as video telephony, broadcast and the storage of high and ultra-high definition videos has steered the development of video coding standards. The state of the art video coding standard is High Efficiency Video Coding (HEVC) or otherwise known as H.265. It promises to be 50 percent more efficient than the previous video coding standard H.264. Ultimately, H.265 provides significant improvement in compression at the expense of computational complexity. HEVC encoder is very complex and 50 percent of the encoding consists of Motion Estimation (ME). It uses a Test Zone (TZ) fast search algorithm for its motion estimation, which compares a block of pixels with a few selected blocks in the search region of a referenced frame. However, the encoding time is not suitable to meet the needs of real time video applications. So, there is a requirement to improve the search algorithm and to provide comparable results to TZ search to save a substantial amount of time. In our paper, we aim to study the effects of a meta-heuristic algorithm on motion estimation. One such suitable algorithm for this task is the Firefly Algorithm (FA). FA is inspired by the social behavior of fireflies and is generally used to solve optimization problems. Our results show that implementing FA for ME saves a considerable amount of time with a comparable encoding efficiency.
International audienceThe pseudo-log image transform belongs to a class of image processing kernels that generate memory references which are nonlinear functions of loop indices. Due to the nonlinearity of the memory references, the usual design methodologies do not allow efficient hardware implementation for nonlinear kernels. For optimized hardware implementation, these kernels require the creation of a customized memory hierarchy and efficient data/memory management strategy. We present the design and real-time hardware implementation of a pseudo-log image transform IP (hardware image processing engine) using a memory management framework. The framework generates a controller which efficiently manages input data movement in the form of tiles between off-chip main memory, on-chip memory, and the core processing unit. The framework can jointly optimize the memory hierarchy and the tile computation schedule to reduce on-chip memory requirements, to maximize throughput, and to increase data reuse for reducing off-chip memory bandwidth requirements. The algorithmic C++ description of the pseudo-log kernel is profiled in the framework to generate an enhanced description with a customized memory hierarchy. The enhanced description of the kernel is then used for high-level synthesis (HLS) to perform architectural design space exploration in order to find an optimal implementation under given performance constraints. The optimized register transfer level implementation of the IP generated after HLS is used for performance estimation. The performance estimation is done in a simulation framework to characterize the IP with different external off-chip memory latencies and a variety of data transfer policies. Experimental results show that the designed IP can be used for real-time implementation and that the generated memory hierarchy is capable of feeding the IP with a sufficiently high bandwidth even in the presence of long external memory latencies
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.