Partial Reconfiguration for Design Optimization

Nguyen, Marie; Serafin, Nathan; Hoe, James C.

doi:10.1109/fpl50879.2020.00061

Cited by 6 publications

(4 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…DF x,y = Accuracy x − Accuracy x−1 P ower x,y − P ower x−1,y−1 (6) DF target = Accuracy max − Accuracy min P ower max − P ower min (7) IV. CONCLUSION While deploying DNNs on edge devices, reducing energy costs is an essential issue that needs to be solved.…”

Section: E Optimal Hardware-model Selectionmentioning

confidence: 99%

“…This ability makes FPGA an affordable solution for rapid technology evolution. In addition, the Dynamic Partial Reconfiguration (DPR) technology [7] makes it possible to adopt run-time optimization strategies to reallocate computing resources and memory access paths in the FPGA platform. [8] proposed a convolution processing unit by employing DPR to achieve adaptive precision in the run-time.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Modelling and Analysis of FPGA-based MPSoC System with Multiple DNN Accelerators

Gao,

Zhu,

Saha

et al. 2023

2023 21st IEEE Interregional NEWCAS Conference (NEWCAS)

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) have been widely applied in many fields for decades, and a standard method for deploying them on embedded systems involves using accelerators. However, due to the resource constraints of embedded systems, improving energy and computing efficiency becomes one of the research challenges in this domain. DNN model optimization and NAS (Neural Architecture Searching) are commonly used to strengthen the DNN model running efficiency on an embedded system. However, because the system's runtime workloads are varied in practical situations, to further improve the computing efficiency of the system at runtime, real-time hardware and software design space exploration is required to ensure the system is running at the optimal time state at runtime. This paper presents a comprehensive modelling and analysis approach for the performance data (e.g., latency, energy consumption, accuracy, etc.) collected from an AMD-Xilinx heterogeneous MPSoC platform equipped with multiple DNN accelerators. The results demonstrate that the relationships between accuracy loss, hardware performance, and model size are significantly correlated. Furthermore, an appropriate hardware and software configuration could be obtained by giving constraints at runtime.

show abstract

Section: E Optimal Hardware-model Selectionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Modelling and Analysis of FPGA-based MPSoC System with Multiple DNN Accelerators

Gao,

Zhu,

Saha

et al. 2023

2023 21st IEEE Interregional NEWCAS Conference (NEWCAS)

View full text Add to dashboard Cite

show abstract

“…Compared to floating point, it has a smaller dynamic range since there is no exponent. Still, the hardware implementation is easier since it considers two numbers (integer and fractional), and it is usually employed on custom accelerators, and FPGA [45,46].…”

Section: A Data Typesmentioning

confidence: 99%

Reduced-Precision Acceleration of Radio-Astronomical Imaging on Reconfigurable Hardware

et al. 2022

View full text Add to dashboard Cite

Radio telescopes produce large volumes of data that need to be processed to obtain high-resolution sky images. This is a complex task that requires computing systems that provide both high performance and high energy efficiency. Hardware accelerators such as GPUs (Graphics Processing Units) and FPGAs (Field Programmable Gate Arrays) can provide these two features and are thus an appealing option for this application. Most HPC (High-Performance Computing) systems operate in double precision (64-bit) or in single precision (32-bit), and radio-astronomical imaging is no exception. With reduced precision computing, smaller data types (e.g., 16-bit) are used to improve energy efficiency and throughput performance in noise-tolerant applications. We demonstrate that reduced precision can also be used to produce high-quality sky images. To this end, we analyze the gridding component (Image-Domain Gridding) of the widely-used WSClean imaging application. Gridding is typically one of the most time-consuming steps in the imaging process and, therefore, an excellent candidate for acceleration. We identify the minimum required exponent and mantissa bits for a custom floating-point data type. Then, we propose the first custom floating-point accelerator on a Xilinx Alveo U50 FPGA using High-Level Synthesis. Our reduced-precision implementation improves the throughput and energy efficiency of respectively 1.84x and 2.03x compared to the single-precision floating-point baseline on the same FPGA. Our solution is also 2.12x faster and 3.46x more energy-efficient than an Intel i9 9900k CPU (Central Processing Unit) and manages to keep up in throughput with an AMD RX 550 GPU.

show abstract

“…In other words, the mapping of application to CPUs and FPGAs does not change. This static model is highly effective for settings where the CPU/FPGA hardware is exclusively used for a single application, and the reconfigurable logic literature has intensively studied various problems in this space such as how to implement functions on FPGAs for optimal performance [36] [15] [24] [29] and how to improve FPGA programming [39] [42].…”

Section: Introductionmentioning

confidence: 99%

Xar-Trek: Run-time Execution Migration among FPGAs and Heterogeneous-ISA CPUs

Horta,

Chuang,

VSathish

et al. 2021

Preprint

View full text Add to dashboard Cite

Datacenter servers are increasingly heterogeneous: from x86 host CPUs, to ARM or RISC-V CPUs in NICs/SSDs, to FPGAs. Previous works have demonstrated that migrating application execution at run-time across heterogeneous-ISA CPUs can yield significant performance and energy gains, with relatively little programmer effort. However, FPGAs have often been overlooked in that context: hardware acceleration using FPGAs involves statically implementing select application functions, which prohibits dynamic and transparent migration. We present Xar-Trek, a new compiler and run-time software framework that overcomes this limitation. Xar-Trek compiles an application for several CPU ISAs and select application functions for acceleration on an FPGA, allowing execution migration between heterogeneous-ISA CPUs and FPGAs at run-time. Xar-Trek's run-time monitors server workloads and migrates application functions to an FPGA or to heterogeneous-ISA CPUs based on a scheduling policy. We develop a heuristic policy that uses application workload profiles to make scheduling decisions. Our evaluations conducted on a system with x86-64 server CPUs, ARM64 server CPUs, and an Alveo accelerator card reveal 88%-1% performance gains over no-migration baselines.CCS Concepts: • Computer systems organization → Reconfigurable computing; Heterogeneous (hybrid) systems; High-level language architectures; • Applied computing → Data centers.

show abstract

Partial Reconfiguration for Design Optimization

Cited by 6 publications

References 40 publications

Modelling and Analysis of FPGA-based MPSoC System with Multiple DNN Accelerators

Modelling and Analysis of FPGA-based MPSoC System with Multiple DNN Accelerators

Reduced-Precision Acceleration of Radio-Astronomical Imaging on Reconfigurable Hardware

Xar-Trek: Run-time Execution Migration among FPGAs and Heterogeneous-ISA CPUs

Contact Info

Product

Resources

About