2020 International Conference on Field-Programmable Technology (ICFPT) 2020
DOI: 10.1109/icfpt51103.2020.00011
|View full text |Cite
|
Sign up to set email alerts
|

Beyond Peak Performance: Comparing the Real Performance of AI-Optimized FPGAs and GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
26
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 54 publications
(27 citation statements)
references
References 34 publications
0
26
0
Order By: Relevance
“…Fig. 8 illustrates the exploration results of an LSTM layer with (Lx, Lh) = (32,32) and different values of reuse factors, which are from 1 to 10. The red line represents the cases with the same R x and R h .…”
Section: Of An Lstm Layermentioning
confidence: 99%
See 1 more Smart Citation
“…Fig. 8 illustrates the exploration results of an LSTM layer with (Lx, Lh) = (32,32) and different values of reuse factors, which are from 1 to 10. The red line represents the cases with the same R x and R h .…”
Section: Of An Lstm Layermentioning
confidence: 99%
“…The authors in [10] put each LSTM layer on each multi-core to achieve coarse grained pipelining. In [30,31,32,33], the batching technique is used to improve the hardware throughput and utilization for LSTM inferences. However, latency can suffer since different inputs may not come at the same time, meaning that a newly arrived request has to wait until the batch is formed, which imposes a significant latency penalty.…”
Section: Related Workmentioning
confidence: 99%
“…With compute and data intensive deep learning (DL) becoming a major component of many applications, specialized hardware acceleration of such workloads has become a commonplace. More recently, field-programmable gate arrays (FP-GAs) have been shown to deliver state-of-the-art performance when accelerating different DL workloads because of their massive parallelism, flexibility and energy efficiency [1], [2]. With new DL use cases emerging faster than ever, FPGAs are also starting to adapt.…”
Section: Introductionmentioning
confidence: 99%
“…In general, the development of novel FPGA architectures and CAD algorithms depends mainly on a versatile framework that consists of three main components: (1) a set of benchmarks written in a hardware description language or synthesized using high-level synthesis, (2) an architecture model that captures the organization of FPGA blocks and routing architecture as well as area/timing/power models from circuit-level implementations, and (3) a CAD flow that synthesizes the given benchmarks then implements them on a given FPGA architecture [7]. Although most research efforts in the FPGA community are focused on architecture and CAD, benchmarks actually play a crucial role in this flow.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation