2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS) 2012
DOI: 10.1109/mwscas.2012.6292202
|View full text |Cite
|
Sign up to set email alerts
|

NeuFlow: Dataflow vision processing system-on-a-chip

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
54
1

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 74 publications
(55 citation statements)
references
References 4 publications
0
54
1
Order By: Relevance
“…This architecture's data I/O bandwidth of 525 MB/s full-duplex for 203 GOp/s, 2.58 MB/GOp, is far better than the results shown in previous work, such as 24.7 MB/GOp for the neuFlow ASIC [16] or 20 MB/GOp for nn-X [17].…”
Section: Resultscontrasting
confidence: 54%
See 1 more Smart Citation
“…This architecture's data I/O bandwidth of 525 MB/s full-duplex for 203 GOp/s, 2.58 MB/GOp, is far better than the results shown in previous work, such as 24.7 MB/GOp for the neuFlow ASIC [16] or 20 MB/GOp for nn-X [17].…”
Section: Resultscontrasting
confidence: 54%
“…A popular architecture is the one which started as CNP [14] and was further improved and renamed to NeuFlow [15], [16] and later on nn-X [17].…”
Section: Fpga Implementationsmentioning
confidence: 99%
“…Many approaches have been proposed to accelerate the computation or reduce the memory footprint and storage size of DNNs. One approach from the hardware perspective is designing hardware accelerators for the computationally expensive operations in DNNs [19,20,21]. From the algorithmic perspective, a popular route to faster and smaller models is to impose constraints on the parameters of a DNN to reduce the number of free parameters and computational complexity, like low-rankness [22,23,24,25,26,27], sparsity [28,29,30,31], circulant property [32], and sharing of weights [33,34].…”
Section: Introductionmentioning
confidence: 99%
“…It can be clearly seen that the typical on-chip (L1, L2) storages in the memory hierarchy (caches or SRAM-based scratchpad memories) cannot accommodate even a single layer of these ConvNets, as the required storages per layer range from 6 MB to over 300 MB. In addition, the assumption that all coefficients can be stored on-chip ( [16][17] [34]) is not valid anymore, since an additional storage of 14 ∼ 280 MB is required to accommodate the coefficients. Overall, 16∼580 MB is needed for layer-by-layer execution, demonstrating that DRAM is necessary as the main storage for deep ConvNets and also motivating computation near main memory.…”
Section: B Implementation Challenges Of Modern Convnetsmentioning
confidence: 99%