Architectures and Execution Models for Hardware/Software Compilation and Their System-Level Realization

Lange, Holger; Koch, Andreas

doi:10.1109/tc.2009.180

Cited by 25 publications

(15 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We modified the Reference Design by inserting the TechMod into the MCI bus between PowerPC and external 200 MHz DDR2-SDRAM main memory. The GPP can thus access the HA in a simple memory mapped fashion, while the 100 MHz HA can directly access main memory, following our FastLane architecture [16] of giving the GPP priority over the HA to ensure stable system operation. The system was implemented using Xilinx EDK and ISE 10.3, and Synplify Premier DP 9.6.1.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

MARC II: A parametrized speculative multi-ported memory subsystem for reconfigurable computers

Lange

Wink

Koch

2011

2011 Design, Automation &Amp; Test in Europe

View full text Add to dashboard Cite

Abstract-We describe a parameterized memory system suitable as target for automatic high-level language to hardware compilers for reconfigurable computers. It fully supports the spatial computation paradigm by allowing the realization of each memory operator by a dedicated hardware memory port. Interport coherency is maintained only for those ports that actually require it, and efficient speculative execution is enabled by a dynamic scheme for arbitrating access to shared resources (such as main memory), relying on techniques inspired by the branch prediction of conventional software-programmable processors.

show abstract

Section: Methodsmentioning

confidence: 99%

“…While explored mostly for GPPs [13], [17], [21], speculative execution is not limited to that domain [3] and can also be used in the parallel paradigms used for HAs [16].…”

Section: A Speculation In Temporal and Spatial Compute Modelsmentioning

confidence: 99%

MARC II: A parametrized speculative multi-ported memory subsystem for reconfigurable computers

Lange

Wink

Koch

2011

2011 Design, Automation &Amp; Test in Europe

View full text Add to dashboard Cite

show abstract

“…As target platform, we employ a Xilinx ML507 board (Virtex-5 FX-based), using the hardware and software environment described in [16] to achieve high-throughput low-latency access to shared memory between the accelerator(s) and the general-purpose PowerPC 440 processor. As the XC5VFX70T device on the actual board is too small to hold the complete system-on-chip (processor buses, memory controller, network interface, etc.)…”

Section: B Impact On High-level Synthesismentioning

confidence: 99%

Low-latency double-precision floating-point division for FPGAs

Liebig

Koch

2014

2014 International Conference on Field-Programmable Technology (FPT)

Self Cite

View full text Add to dashboard Cite

Abstract-With growing FPGA capacities, applications requiring more intensive use of floating-point arithmetic become feasible candidates for acceleration using reconfigurable logic. Still among the more uncommon operations, however, are fast double-precision divider units. Since our application domain (acceleration of custom-compiled convex solvers) heavily relies on these blocks, we have implemented low-latency dividers based on the Goldschmidt algorithm that are accurate up to 1 bit of least precision (1-ULP). On Virtex-6 devices, our units operate at 200 MHz and significantly outperform other state-of-the-art 1-ULP dividers. We evaluate our blocks both stand-alone, as well as on the application-level when used for the high-level synthesis of the convex solver cores.

show abstract

“…At this level, the loop has been encapsulated as a single operation. When it detects the loop termination condition, it signals the end of hardware execution to the hardware/software interface layer [16] and passes back the computed factorial from hardware to software. Since we compile for the ACS target to a fully spatial hardware implementation with no operator reuse, we can employ a variant of the classical As-Soon-AsPossible (ASAP) static scheduling algorithm [21], adding just minor extensions to obey explicit constraints (discussed in Section 4.4).…”

Section: Hardware Synthesis In Nymblementioning

confidence: 99%

Widening the Memory Bottleneck by Automatically-Compiled Application-Specific Speculation Mechanisms

Thielmann

Huthmann

Wink

et al. 2012

Embedded Systems Design With FPGAs

Self Cite

View full text Add to dashboard Cite

Architectures and Execution Models for Hardware/Software Compilation and Their System-Level Realization

Cited by 25 publications

References 15 publications

MARC II: A parametrized speculative multi-ported memory subsystem for reconfigurable computers

MARC II: A parametrized speculative multi-ported memory subsystem for reconfigurable computers

Low-latency double-precision floating-point division for FPGAs

Widening the Memory Bottleneck by Automatically-Compiled Application-Specific Speculation Mechanisms

Contact Info

Product

Resources

About