Parameterizable Design on Convolutional Neural Networks Using Chisel Hardware Construction Language

Madineni, Mukesh Chowdary; Mario, Vega; Yang, Xiaokun

doi:10.3390/mi14030531

Cited by 2 publications

(3 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While allocating all parallelism N to the output channel may initially appear as a rational decision, the revised latency estimator defined in Equation (7) highlights that improper selection of parallelism parameters for the input and output channels, denotes as PI and PO (while N = PI • PO), respectively, can lead to wasted clock cycles due to ceiling operations. We defer a discussion of this aspect to the concluding part of this section and instead concentrate on exploring the existence of a viable parallelism allocation scheme given the available data per clock (DPC).…”

Section: Dot Product With Variable Lengthmentioning

confidence: 99%

“…To expedite this process, many researchers turn to HLS tools to automatically generate Verilog hardware description language (HDL) code from C/C++ code [6]. However, while HLS tools offer a quicker path to implementation, they often fall short of achieving optimal resource consumption and operation scheduling compared to manually crafted designs by experienced FPGA designers [7]. Consequently, a significant gap exists between the actual performance achieved and the theoretical limits imposed by the external memory bandwidth and computational capacity of FPGAs, particularly when dealing with full-precision networks, as depicted in the roofline model [8].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure

Xu,

Luo,

Sun

2024

Sensors

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) have significantly advanced various fields; however, their computational demands and power consumption have escalated, posing challenges for deployment in low-power scenarios. To address this issue and facilitate the application of CNNs in power constrained environments, the development of dedicated CNN accelerators is crucial. Prior research has predominantly concentrated on developing low precision CNN accelerators using code generated from high-level synthesis (HLS) tools. Unfortunately, these approaches often fail to efficiently utilize the computational resources of field-programmable gate arrays (FPGAs) and do not extend well to full precision scenarios. To overcome these limitations, we integrate vector dot products to unify the convolution and fully connected layers. By treating the row vector of input feature maps as the fundamental processing unit, we balance processing latency and resource consumption while eliminating data rearrangement time. Furthermore, an accurate design space exploration (DSE) model is established to identify the optimal design points for each CNN layer, and dynamic partial reconfiguration is employed to maximize each layer’s access to computational resources. Our approach is validated through the implementation of AlexNet and VGG16 on 7A100T and ZU15EG platforms, respectively. We achieve an average convolutional layer throughput of 28.985 GOP/s and 246.711 GOP/s for full precision. Notably, the proposed accelerator demonstrates remarkable power efficiency, with a maximum improvement of 23.989 and 15.376 times compared to current state-of-the-art FPGA implementations.

show abstract

Section: Dot Product With Variable Lengthmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure

Xu,

Luo,

Sun

2024

Sensors

View full text Add to dashboard Cite

show abstract

“…Their implementation of FPGA validates the acceleration of classical CNNs such as Alexnet, VGG-16, and ResNet-50. Madineni et al (reference [ 4 ]) present a parameterized design of a CNN network using Chisel, an open-source hardware construction language developed at UC Berkeley. This design allows for flexible implementation options, supporting 16-bit, 32-bit, 64-bit, and 128-bit configurations on FPGA.…”

mentioning

confidence: 99%

Editorial for the Beyond Moore’s Law: Hardware Specialization and Advanced System on Chip

Yang

2023

Micromachines

Self Cite

View full text Add to dashboard Cite

show abstract

Parameterizable Design on Convolutional Neural Networks Using Chisel Hardware Construction Language

Cited by 2 publications

References 28 publications

Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure

Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure

Editorial for the Beyond Moore’s Law: Hardware Specialization and Advanced System on Chip

Contact Info

Product

Resources

About