Heterogeneous hardware accelerator architecture for streaming image processing

Pham‐Quoc, Cuong; Al-Ars, Zaid; Bertels, Koen

doi:10.1109/atc.2013.6698140

Cited by 2 publications

(2 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The presented system is capable of processing four independent data streams in parallel. A very similar project but intended for images is presented in [14]. The authors of this project keep the data-set of images on their hard drive and send them to the hardware accelerators implemented in FPGA for further processing.…”

Section: Fpgas For Hardware Acceleration Of Image and Video Processingmentioning

confidence: 99%

Embedded platform for local image descriptor based object detection

Kapela

Gugała

Śniatała

et al. 2015

Applied Mathematics and Computation

View full text Add to dashboard Cite

Section: Fpgas For Hardware Acceleration Of Image and Video Processingmentioning

confidence: 99%

Embedded platform for local image descriptor based object detection

Kapela

Gugała

Śniatała

et al. 2015

Applied Mathematics and Computation

View full text Add to dashboard Cite

“…Phases 3,4,6, and 7 in the NoC-based interconnect system are shorter than in the baseline system due to data movement through the NoC. While all K1 out-put (D H 1(out ) and D K 1(out ) ) is copied back to the main memory in the baseline system in Phase 3, only part of this output (D H 1(out ) ) is copied to the main memory in the NoC-based interconnect system because data output consumed by K2 and K3 (D K 1(out ) ) is transferred to K2 and K3 by the NoC in Phase 2 (parallel with K1 execution).…”

Section: Modeling Noc-based Interconnectmentioning

confidence: 99%

Hybrid Interconnect Design for Heterogeneous Hardware Accelerators

Pham‐Quoc¹,

Heisswolf²,

Werner³

et al. 2013

Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE), 2013

Self Cite

View full text Add to dashboard Cite

Heterogeneous multicore systems are becoming increasingly important as the need for computation power grows, especially when we are entering into the big data era. As one of the main trends in heterogeneous multicore, hardware accelerator systems provide application specific hardware circuits and are thus more energy efficient and have higher performance than general purpose processors, while still providing a large degree of flexibility. However, system performance dose not scale when increasing the number of processing cores due to the communication overhead which increases greatly with the increasing number of cores. Although data communication is a primary anticipated bottleneck for system performance, the interconnect design for data communication among the accelerator kernels has not been well addressed in hardware accelerator systems. A simple bus or shared memory is usually used for data communication between the accelerator kernels. In this dissertation, we address the issue of interconnect design for heterogeneous hardware accelerator systems.Evidently, there are dependencies among computations, since data produced by one kernel may be needed by another kernel. Data communication patterns can be specific for each application and could lead to different types of interconnect. In this dissertation, we use detailed data communication profiling to design an optimized hybrid interconnect that provides the most appropriate support for the communication pattern inside an application while keeping the hardware resource usage for the interconnect minimal. Firstly, we propose a heuristicbased approach that takes application data communication profiling into account to design a hardware accelerator system with a custom interconnect. A number of solutions are considered including crossbar-based shared local memory, direct memory access (DMA) supporting parallel processing, local buffers, and hardware duplication. This approach is mainly useful for embedded system where the hardware resources are limited. Secondly, we propose an automated hybrid interconnect design using data communication profiling to define an optimized interconnect for accelerator kernels of a generic hardware accelerator system. The hybrid interconnect consists of a network-on-chip (NoC), vii viii ABSTRACT shared local memory, or both. To minimize hardware resource usage for the hybrid interconnect, we also propose an adaptive mapping algorithm to connect the computing kernels and their local memories to the proposed hybrid interconnect. Thirdly, we propose a hardware accelerator architecture to support streaming image processing. In all presented approaches, we implement the approach using a number of benchmarks on relevant reconfigurable platforms to show their effectiveness. The experimental results show that our approaches not only improve system performance but also reduce overall energy consumption compared to the baseline systems.

show abstract

Heterogeneous hardware accelerator architecture for streaming image processing

Cited by 2 publications

References 20 publications

Embedded platform for local image descriptor based object detection

Embedded platform for local image descriptor based object detection

Hybrid Interconnect Design for Heterogeneous Hardware Accelerators

Contact Info

Product

Resources

About