Towards reliable 5Gbps wave-pipelined and 3Gbps surfing interconnect in 65nm FPGAs

Teehan, Paul; Lemieux, Guy; Greenstreet, Mark R.

doi:10.1145/1508128.1508136

Cited by 14 publications

(7 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This saves the area, power and timing overhead of using registers. It was shown in [20] that wave-pipelined interconnect could be used in an FPGA.…”

Section: Why Not Wave-pipelining?mentioning

confidence: 99%

“…As a result, the effective latency of a wave-pipelined link changes with clock frequency. Additionally, wave-pipelining systems must operate robustly in the presence of die-to-die and on-chip variation, as well as in the presence of crosstalk and power supply noise [20]. These non-idealities are expected to become more significant in future process technologies, and the flexibility of FPGAs would make verifying such systems difficult.…”

Section: Why Not Wave-pipelining?mentioning

confidence: 99%

“…Additionally, emerging FPGA communication styles such as embedded NoCs [6,1] result in variable latency communication, essentially requiring designs to be latency insensitive. LID also does not require modification of existing FPGA architectures, as would be required to fully support wave-pipelining [20], asynchronous, or GALS [17] design styles.…”

Section: Latency Insensitive Designmentioning

confidence: 99%

See 2 more Smart Citations

Quantifying the cost and benefit of latency insensitive communication on FPGAs

Murray

Betz

2014

Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

View full text Add to dashboard Cite

Latency insensitive communication offers many potential benefits for FPGA designs, including easier timing closure by enabling automatic pipelining, and easier interfacing with embedded NoCs. However, it is important to understand the costs and trade-offs associated with any new design style. This paper presents optimized implementations of latency insensitive communication building blocks, quantifies their overheads in terms of area and frequency, and provides guidance to designers on how to generate high-speed and areaefficient latency insensitive systems.

show abstract

“…This saves the area, power and timing overhead of using registers. It was shown in [20] that wave-pipelined interconnect could be used in an FPGA.…”

Section: Why Not Wave-pipelining?mentioning

confidence: 99%

Section: Why Not Wave-pipelining?mentioning

confidence: 99%

Section: Latency Insensitive Designmentioning

confidence: 99%

See 1 more Smart Citation

Quantifying the cost and benefit of latency insensitive communication on FPGAs

Murray

Betz

2014

Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

View full text Add to dashboard Cite

show abstract

“…For matrices generated from spicef5, we use circuit benchmarks provided by Simucad [9], Igor Markov [8] and Paul Teehan [7]. Our benchmark set captures matrices from a broad range of problems that have widely differing structure.…”

Section: Benchmarksmentioning

confidence: 99%

“…• Quantitative empirical comparison of KLU Matrix Solver on the Intel Core i7 965 and a Virtex-5 FPGA for a variety of matrices generated from spice3f5 circuit simulations [7], [8], [9], the UFL Sparse Matrix collection [10] and Power-system matrices from the Matrix Market suite [11].…”

Section: Introduction Spice (Simulation Program With Integratedmentioning

confidence: 99%

Parallelizing sparse Matrix Solve for SPICE circuit simulation using FPGAs

Kapre

DeHon

2009

2009 International Conference on Field-Programmable Technology

View full text Add to dashboard Cite

Fine-grained dataflow processing of sparse Matrix-Solve computation (A x = b) in the SPICE circuit simulator can provide an order of magnitude performance improvement on modern FPGAs. Matrix Solve is the dominant component of the simulator especially for large circuits and is invoked repeatedly during the simulation, once for every iteration. We process sparse-matrix computation generated from the SPICEoriented KLU solver in dataflow fashion across multiple spatial floating-point operators coupled to high-bandwidth on-chip memories and interconnected by a low-latency network. Using this approach, we are able to show speedups of 1.2-64× (geometric mean of 8.8×) for a range of circuits and benchmark matrices when comparing double-precision implementations on a 250MHz Xilinx Virtex-5 FPGA (65nm) and an Intel Core i7 965 processor (45nm).

show abstract