2023
DOI: 10.3390/mi14030531
|View full text |Cite
|
Sign up to set email alerts
|

Parameterizable Design on Convolutional Neural Networks Using Chisel Hardware Construction Language

Abstract: This paper presents a parameterizable design generator on convolutional neural networks (CNNs) using the Chisel hardware construction language (HCL). By parameterizing structural designs such as the streaming width, pooling layer type, and floating point precision, multiple register–transfer level (RTL) implementations can be created to meet various accuracy and hardware cost requirements. The evaluation is based on generated RTL designs including 16-bit, 32-bit, 64-bit, and 128-bit implementations on field-pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…While allocating all parallelism N to the output channel may initially appear as a rational decision, the revised latency estimator defined in Equation (7) highlights that improper selection of parallelism parameters for the input and output channels, denotes as PI and PO (while N = PI • PO), respectively, can lead to wasted clock cycles due to ceiling operations. We defer a discussion of this aspect to the concluding part of this section and instead concentrate on exploring the existence of a viable parallelism allocation scheme given the available data per clock (DPC).…”
Section: Dot Product With Variable Lengthmentioning
confidence: 99%
See 1 more Smart Citation
“…While allocating all parallelism N to the output channel may initially appear as a rational decision, the revised latency estimator defined in Equation (7) highlights that improper selection of parallelism parameters for the input and output channels, denotes as PI and PO (while N = PI • PO), respectively, can lead to wasted clock cycles due to ceiling operations. We defer a discussion of this aspect to the concluding part of this section and instead concentrate on exploring the existence of a viable parallelism allocation scheme given the available data per clock (DPC).…”
Section: Dot Product With Variable Lengthmentioning
confidence: 99%
“…To expedite this process, many researchers turn to HLS tools to automatically generate Verilog hardware description language (HDL) code from C/C++ code [6]. However, while HLS tools offer a quicker path to implementation, they often fall short of achieving optimal resource consumption and operation scheduling compared to manually crafted designs by experienced FPGA designers [7]. Consequently, a significant gap exists between the actual performance achieved and the theoretical limits imposed by the external memory bandwidth and computational capacity of FPGAs, particularly when dealing with full-precision networks, as depicted in the roofline model [8].…”
Section: Introductionmentioning
confidence: 99%
“…Their implementation of FPGA validates the acceleration of classical CNNs such as Alexnet, VGG-16, and ResNet-50. Madineni et al (reference [ 4 ]) present a parameterized design of a CNN network using Chisel, an open-source hardware construction language developed at UC Berkeley. This design allows for flexible implementation options, supporting 16-bit, 32-bit, 64-bit, and 128-bit configurations on FPGA.…”
mentioning
confidence: 99%