Hybrid Interconnect Design for Heterogeneous Hardware Accelerators

Pham‐Quoc, Cuong; Heisswolf, Jan; Werner, Stephan; Al-Ars, Zaid; Becker, J.; Bertels, Koen

doi:10.7873/date.2013.178

Cited by 12 publications

(5 citation statements)

References 98 publications

(159 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Upon optimizing and generating HDL-based hardware descriptions for the four accelerator cores, we proceed to construct our testing system based on the hardware accelerator computing paradigm [3]. In our case study, we employ the Xilinx MPSoC Ultra96v2 edge computing board [20] as the evaluation platform.…”

Section: System Implementationmentioning

confidence: 99%

“…Subsequently, we employ High-Level Synthesis (HLS) tools to leverage various optimizations, such as loop unrolling or pipelining, to generate the most optimized accelerator core. Finally, the system is constructed based on the hardware accelerator architecture [3], aiming to capitalize on the advantages offered by both generalpurpose processors and FPGA-based accelerator cores.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

HLS and FPGA-Powered Streaming Video Encoder Accelerator for IoTs Edge Computing

Pham-Quoc

2024

JAIT

Self Cite

View full text Add to dashboard Cite

In recent years, Internet of Thing (IoT) applications with video processing deployed on edge computing platforms have been widely exploited for many areas, such as surveillance, object monitoring, or checkin/check-out systems. While H.264 video format is widely used for most modern cameras due to its efficiency, the computing power of edge computing platforms usually needs to be higher to process H.264 videos in acceptable intervals. This paper proposes an approach based on the high-level synthesis technique to accelerate video encoding by Field Programmable Gate Array (FPGA) platforms for edge computing applications. The H.264 encoding, chosen as our case study, is profiled to locate the computational-intensive functions to accelerate by FPGA. We use the high-level synthesis technique with optimization approaches, including loop unrolling, loop pipeline, function pipeline, and array partition, to generate the accelerator core. The core is then implemented with the hardware accelerator computing paradigm combining a host processor and the hardware accelerator cores for processing H.264 encoding. The approach is tested with an edge computing FPGA Ultra96-v2 board to validate the proposed approach and evaluate the performance. Experimental results show that we achieve speed-ups by up to 14.9 compared to an Advanced RISC Machine (ARM) quad-core processor. In terms of power consumption, our system requires 4.208 W.

show abstract

Section: System Implementationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

HLS and FPGA-Powered Streaming Video Encoder Accelerator for IoTs Edge Computing

Pham-Quoc

2024

JAIT

Self Cite

View full text Add to dashboard Cite

show abstract

“…To overcome this obstacle, we use FPGA-based hardware accelerator platforms for edge computing devices where we can exploit the computation flexibility of host processors as well as the high performance of reconfigurable fabrics [2,3]. Furthermore, although FPGAs suffer from low working frequency, they outperform GPUs in energy consumption and provide higher performance and energy-efficient than CPUs [4]. Therefore, an FPGA-based hardware accelerator is a promising approach for building high-performance DNN in edge devices.…”

Section: Introductionmentioning

confidence: 99%

An FPGA-based Convolution IP Core for Deep Neural Networks Acceleration

Nguyen-Xuan

Pham‐Quoc

2022

REV J. Electron. Commun.

View full text Add to dashboard Cite

The development of machine learning has madea revolution in various applications such as object detection,image/video recognition, and semantic segmentation. Neuralnetworks, a class of machine learning, play a crucial role inthis process because of their remarkable improvement overtraditional algorithms. However, neural networks are now goingdeeper and cost a significant amount of computation operations.Therefore they usually work ineffectively in edge devices thathave limited resources and low performance. In this paper, weresearch a solution to accelerate the neural network inferencephase using FPGA-based platforms. We analyze neural networkmodels, their mathematical operations, and the inference phasein various platforms. We also profile the characteristics thataffect the performance of neural network inference. Based on theanalysis, we propose an architecture to accelerate the convolutionoperation used in most neural networks and takes up most ofthe computations in networks in terms of parallelism, data reuse,and memory management. We conduct different experiments tovalidate the FPGA-based convolution core architecture as wellas to compare performance. Experimental results show that thecore is platform-independent. The core outperforms a quad-coreARM processor functioning at 1.2GHz and a 6-core Intel CPUwith speed-ups of up to 15.69 and 2.78, respectively.

show abstract

“…The Xilinx PYNQ-Z2 edge computing platform with a Xilinx MPSoC Zynq FPGA device [6] is used to build a testing system based on the hardware accelerator paradigm [7]. The proposed architecture is implemented on the FPGA fabrics, while the ARM-hardwired processor is responsible for preprocessing data and determining the final results based on random forest computation.…”

mentioning

confidence: 99%

Efficient Random Forest Acceleration for Edge Computing Platforms with FPGA Technology

Pham-Quoc,

Pham-Dinh,

Kieu-Do-Nguyen

2024

JAIT

Self Cite

View full text Add to dashboard Cite

As one of the most successful instances of ensemble learning algorithms, Random Forest offers many advantages compared to other approaches. However, it is unsuitable for edge computing platforms due to its high computational power. In this paper, we present our proposed efficient architecture to perform random forest effectively for edge computing platforms based on Field Programmable Gate Array (FPGA) technology. The heart of the system is our Decision Tree Unit (DTU) architecture, which is mainly responsible for processing decision trees in the pipeline to achieve better performance. One of the biggest obstacles to decision tree implementation on hardware is the memory size. In this paper, we also propose a sufficient structure for storing decision trees' information for the execution of DTUs. Since we target edge computing platforms with limited resources and energy, the architecture supports the scalability of the number of DTUs in the system. Based on the available resources of the target platform, the system can be reconfigured accordingly. We implement our prototype version with the PYNQ Z2 FPGA edge computing board. We test the proposed system with the number of DTUs changed from 1 to 15. We conduct experiments and analysis with a certified dataset and compare with Intel core i7 and core i9 processors to show our efficiency and scalability. The experimental results show that we can achieve speed-ups by up to 19.96 compared to the Intel Core i7 desktop version and 12 compared to the Intel Core i9 high-performance computing version. Regarding energy consumption, we save up to 33.24 and 146.24 compared to the two processors. Keywords-FieldProgrammable Gate Array (FPGA) technology, decision tree, random forest acceleration, edge computing platforms I. INTRODUCTION Random Forest (RF) is a successful example of supervised learning that has many applications in various fields, including finance and banking, e-commerce, and healthcare. It is particularly useful when dealing with

show abstract

Hybrid Interconnect Design for Heterogeneous Hardware Accelerators

Cited by 12 publications

References 98 publications

HLS and FPGA-Powered Streaming Video Encoder Accelerator for IoTs Edge Computing

HLS and FPGA-Powered Streaming Video Encoder Accelerator for IoTs Edge Computing

An FPGA-based Convolution IP Core for Deep Neural Networks Acceleration

Efficient Random Forest Acceleration for Edge Computing Platforms with FPGA Technology

Contact Info

Product

Resources

About