2018
DOI: 10.1016/j.micpro.2018.04.004
|View full text |Cite
|
Sign up to set email alerts
|

Throughput optimizations for FPGA-based deep neural network inference

Abstract: Deep neural networks are an extremely successful and widely used technique for various pattern recognition and machine learning tasks. Due to power and resource constraints, these computationally intensive networks are difficult to implement in embedded systems. Yet, the number of applications that can benefit from the mentioned possibilities is rapidly rising. In this paper, we propose novel architectures for the inference of previously learned and arbitrary deep neural networks on FPGAbased SoCs that are abl… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
11
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(11 citation statements)
references
References 15 publications
0
11
0
Order By: Relevance
“…2 is designed for GPU clusters, it also applies to other accelerators like TPU or FPGA. Existing measurements have shown that TPU [50] and FPGA [69] also exhibit a non-linear, monotonically-increasing relationship between t p and x, making it feasible to adopt LB-BSP in heterogeneous clusters with those hardware.…”
Section: A Drop-in Algorithm For Lb-bsp In Gpu Clustersmentioning
confidence: 99%
“…2 is designed for GPU clusters, it also applies to other accelerators like TPU or FPGA. Existing measurements have shown that TPU [50] and FPGA [69] also exhibit a non-linear, monotonically-increasing relationship between t p and x, making it feasible to adopt LB-BSP in heterogeneous clusters with those hardware.…”
Section: A Drop-in Algorithm For Lb-bsp In Gpu Clustersmentioning
confidence: 99%
“…With the rapid development of semiconductor technology, the continuous increase in the scale of integrated circuits, and the continuous improvement of various development technology levels, English speech recognition technology has gradually become smaller after being combined with embedded systems based on DSP, FPGA, ASIC, and other devices. With the development of industrialization and practicality, the application field is also getting bigger and bigger [3]. As a modern information technology with extensive social and economic benefits, English speech recognition has made certain achievements, but there are still a series of problems when facing practical use.…”
Section: Introductionmentioning
confidence: 99%
“…Similarly, in [21], Yoon et al propose an expandable architecture capable of selective retraining when presented with new tasks and data. The continuous learning research community is mainly interested in scenarios [24] where datasets are unavailable in the form of neatly organized fully tagged datasets, but in an evolving application where new samples are made available at different times after the models are trained on the main dataset. An example of this would be a self-driving car trained on an object detection dataset receiving new data samples during deployment.…”
Section: Introductionmentioning
confidence: 99%