2019
DOI: 10.48550/arxiv.1902.07463
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DNNVM : End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-based CNN Accelerators

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2019
2019

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 25 publications
0
5
0
Order By: Relevance
“…However, depthwise separable convolution spends 95% computation time in Conv 1 × 1, which causes a large MAdds gap between two consecutive laysers (Conv 1 × 1 and Conv DW 3×3) [12]. This gap is unfriendly to embedded systems who load all weights of the network to perform convolution [24]: embedded systems need extra buffers for Conv 1 × 1.…”
Section: Approach 21 Variable Group Convolutionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, depthwise separable convolution spends 95% computation time in Conv 1 × 1, which causes a large MAdds gap between two consecutive laysers (Conv 1 × 1 and Conv DW 3×3) [12]. This gap is unfriendly to embedded systems who load all weights of the network to perform convolution [24]: embedded systems need extra buffers for Conv 1 × 1.…”
Section: Approach 21 Variable Group Convolutionmentioning
confidence: 99%
“…Communication between off-chip memory and on-chip memory only happens on the start and the end of block computing when a block is grouped and computed together on embedded systems [24]. To limit the communication cost, VarGNet sets the number of output channels to be same as the number of input channels in the normal block.…”
Section: Blocks Of Variable Group Networkmentioning
confidence: 99%
“…Typically, the size of convolutional kernels is much lower than the size of feature maps, such as k 2 C for kernels and 2HW C for feature maps in 2D convolutions. In light of the above two properties, an ingenious solution is to load all the data of kernels first and then perform the convolution with popping and popping out feature data sequentially [48] . Such practical solution is the second intuition for our following two guidelines for efficient network design on embedded systems:…”
Section: Designing Efficient Network On Embedded Systemsmentioning
confidence: 99%
“…Also, in these blocks, residual connections [18] are widely adopted. So, in recent compiler-side optimizations [48] , layers in a block are usually grouped and computed together. In such manner, off-chip memory and on-chip memory only communicates when starting or ending computing a block in the network.…”
Section: Designing Efficient Network On Embedded Systemsmentioning
confidence: 99%
See 1 more Smart Citation