VENICE: A compact vector processor for FPGA applications

Severance, Aaron; Lemieux, Guy

doi:10.1109/hotchips.2011.7477515

Cited by 7 publications

(7 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There have been many proposed soft processor architectures with improved performance, such as using multithreading [1], VLIW processors [2], vector processors [3], and even architectures developed from a first-principles examination of an FPGA's capabilities [4]. However, in order to preserve the familiar single-threaded programming model, superscalar and out of order processors are required, even though they are generally less efficient than specialized architectures.…”

Section: Introductionmentioning

confidence: 99%

Efficient methods for out-of-order load/store execution for high-performance soft processors

Wong

Betz

Rose

2013

2013 International Conference on Field-Programmable Technology (FPT)

View full text Add to dashboard Cite

Abstract-As FPGAs continue to increase in size, it becomes increasingly feasible and desirable to build higher performance soft processors. Preserving the familiar single-threaded programming model can be done with an out of order processor. The ability to execute memory loads and stores out of order has a large impact on performance, but this is difficult to do because the dependencies between stores and loads are not known until addresses are computed. Out of order memory disambiguation is traditionally done with CAMs in the load queue and store queue, but large CAMs are inefficient on FPGAs. Store Queue Index Prediction (SQIP) and NoSQ propose to replace CAMs with store-load forwarding prediction and load re-execution.We implement four memory disambiguation schemes (in-order, CAM, SQIP, NoSQ) on a Stratix IV FPGA and evaluate the area and delay trade-offs. We find that CAM area and delay degrade quickly with load/store queue size, while SQIP and NoSQ have little degradation with queue size but have area overhead for prediction and predictor training hardware. SQIP and NoSQ use less area than CAMs beyond 32 and 16 load/store queue entries, respectively, and have higher maximum frequency beyond 4 entries.

show abstract

Section: Introductionmentioning

confidence: 99%

Efficient methods for out-of-order load/store execution for high-performance soft processors

Wong

Betz

Rose

2013

2013 International Conference on Field-Programmable Technology (FPT)

View full text Add to dashboard Cite

show abstract

“…These SVPs were traditional load/store vector architectures. The first scratchpad-based SVP was VEGAS [1] followed by VENICE [4]. VENICE added 2D/3D vector instructions (but only 1D DMA), as well as condition codes using the 9th bit of FPGA memory blocks.…”

Section: Related Workmentioning

confidence: 99%

Embedded supercomputing in FPGAs with the VectorBlox MXP Matrix Processor

Severance

Lemieux

2013

2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)

Self Cite

View full text Add to dashboard Cite

Embedded systems frequently use FPGAs to perform highly parallel data processing tasks. However, building such a system usually requires specialized hardware design skills with VHDL or Verilog. Instead, this paper presents the VectorBlox MXP Matrix Processor, an FPGA-based soft processor capable of highly parallel execution. Programmed entirely in C, the MXP is capable of executing dataparallel software algorithms at hardware-like speeds. For example, the MXP running at 200MHz or higher can implement a multi-tap FIR filter and output 1 element per clock cycle. MXP's parameterized design lets the user specify the amount of parallelism required, ranging from 1 to 128 or more parallel ALUs. Key features of the MXP include a parallel-access scratchpad memory to hold vector data and high-throughput DMA and scatter/gather engines. To provide extreme performance, the processor is expandable with custom vector instructions and custom DMA filters. Finally, the MXP seamlessly ties into existing Altera and Xilinx development flows, simplifying system creation and deployment.

show abstract

“…The vector components are automatically formed by linking several smaller components together. Exploiting vectorization requires the programmer to use certain pre-defined builtin-functions (conceptually similar to [21]). …”

Section: Accelerators and Hyper-tasksmentioning

confidence: 99%

Empowering OpenMP with automatically generated hardware

Podobas

Brorsson

2016

2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)

View full text Add to dashboard Cite

Abstract-OpenMP enables productive software development that targets shared-memory general purpose systems. However, OpenMP compilers today have little support for future heterogeneous systems -systems that will more than likely contain Field Programmable Gate Arrays (FPGAs) to compensate for the lack of parallelism available in general purpose systems.We have designed a high-level synthesis flow that automatically generates parallel hardware from unmodified OpenMP programs. The generated hardware is composed of accelerators tailored to act as hardware instances of the OpenMP task primitive. We drive decision making of complex details within accelerators through a constraint-programming model, minimizing the expected input from the (often) hardware-oblivious software developer.We evaluate our system and compare them to two state of the art architectures -the Xeon PHI and the AMD Opteron -where we find our accelerators to perform on par with the two ASIC processors.

show abstract

VENICE: A compact vector processor for FPGA applications

Cited by 7 publications

References 6 publications

Efficient methods for out-of-order load/store execution for high-performance soft processors

Efficient methods for out-of-order load/store execution for high-performance soft processors

Embedded supercomputing in FPGAs with the VectorBlox MXP Matrix Processor

Empowering OpenMP with automatically generated hardware

Contact Info

Product

Resources

About