2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines 2012
DOI: 10.1109/fccm.2012.13
|View full text |Cite
|
Sign up to set email alerts
|

Impact of Cache Architecture and Interface on Performance and Area of FPGA-Based Processor/Parallel-Accelerator Systems

Abstract: Abstract-We describe new multi-ported cache designs suitable for use in FPGA-based processor/parallel-accelerator systems, and evaluate their impact on application performance and area. The baseline system comprises a MIPS soft processor and custom hardware accelerators with a shared memory architecture: on-FPGA L1 cache backed by off-chip DDR2 SDRAM. Within this general system model, we evaluate traditional cache design parameters (cache size, line size, associativity). In the parallel accelerator context, we… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
36
0

Year Published

2013
2013
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 49 publications
(36 citation statements)
references
References 18 publications
0
36
0
Order By: Relevance
“…= 0), one of the two BRAM ports is dedicated to communicate with the host (the same situation is reported in [Choi et al, 2012]). Therefore, a crossbar is used to share the local memories of the two kernels HW i and HW j .…”
Section: Modeling Shared Local Memorymentioning
confidence: 87%
See 1 more Smart Citation
“…= 0), one of the two BRAM ports is dedicated to communicate with the host (the same situation is reported in [Choi et al, 2012]). Therefore, a crossbar is used to share the local memories of the two kernels HW i and HW j .…”
Section: Modeling Shared Local Memorymentioning
confidence: 87%
“…Research in [Choi et al, 2012] proposed a multi-ported cache design for communication of multiple accelerator kernels in an FPGA-based accelerator system. However, this proposal is system-dependent since they assume that the on-chip memory can work at 2× the speed of the system clock (clock for kernels).…”
Section: Hardware Level Optimizationmentioning
confidence: 99%
“…Recent work has also explored the design space of the cache micro-architecture [15][16][17][18][19]. Matthews et al [17] explore the efficiency in terms of speed-up versus area increase of parallel coherent L1 caches with respect to size, associativity and replacement rule in an FPGA-based soft multi-core processor.…”
Section: Related Workmentioning
confidence: 99%
“…Matthews et al [17] explore the efficiency in terms of speed-up versus area increase of parallel coherent L1 caches with respect to size, associativity and replacement rule in an FPGA-based soft multi-core processor. Similarly, Choi et al [18] compare different configurations of cache size, line size and associativity of shared on-chip caches, in addition to two approaches for increasing the number of access ports of the shared cache. FCache [16] and LEAP Coherent Memories [15] target the micro-architecture of coherency mechanisms for shared memory systems in FPGAs.…”
Section: Related Workmentioning
confidence: 99%
“…Multipumping is also widely used in memories to "mimic" the availability of extra memory ports. Choi et al's work in [4] found that multi-pumped caches had the best performance and area for FPGA processor/parallel-accelerator systems. A Xilinx white paper [16] describes how multi-pumping can improve the throughput of a DSP block in isolation, outside of the HLS context.…”
Section: Related Workmentioning
confidence: 99%