Hardware compilation of application-specific memory-access interconnect

Venkataramani, Girish; Bjerregaard, Tobias; Chelcea, Tiberiu; Goldstein, Seth Copen

doi:10.1109/tcad.2006.870411

Cited by 8 publications

(3 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…At the very least, ESL flow is supposed to incorporate a front-end with higher level specifications (e.g., SystemC or ANSI-C) into synthesis flow. There are several interesting publications related to ESL flow for asynchronous (bundled delay) design [44], [114], [115]. This flow is actually closer to a software C-compiler than a hardware synthesis flow.…”

Section: Examples and Discussionmentioning

confidence: 99%

Elastic Circuits

Carmona

Cortadella

Kishinevsky

et al. 2009

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

Elasticity in circuits and systems provides tolerance to variations in computation and communication delays. This paper presents a comprehensive overview of elastic circuits for those designers who are mainly familiar with synchronous design. Elasticity can be implemented both synchronously and asynchronously, although it was traditionally more often associated with asynchronous circuits. This paper shows that synchronous and asynchronous elastic circuits can be designed, analyzed, and optimized using similar techniques. Thus, choices between synchronous and asynchronous implementations are localized and deferred until late in the design process.

show abstract

Section: Examples and Discussionmentioning

confidence: 99%

Elastic Circuits

Carmona

Cortadella

Kishinevsky

et al. 2009

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

show abstract

“…However, while most other studies focus on algorithmic-centric reconfigurable computing architectures, our study focuses primarily on how to exploit memory-level parallelism. Recently, extracting memorylevel parallelism in reconfigurable computing has attracted more attention [28][29][30][31]. For example, recent work [32] proposed a many-cache memory architecture that improves caching in commercially available FPGAs.…”

Section: Related Workmentioning

confidence: 99%

ASTRO: Synthesizing application-specific reconfigurable hardware traces to exploit memory-level parallelism

Lin

Chen²,

DeMara

et al. 2015

Microprocessors and Microsystems

View full text Add to dashboard Cite

“…Budiu et al identified two primary bottlenecks in their work: significant latency overheads due to a deeply pipelined memory arbitration tree that is necessary to support parallel memory requests [6], as well as performance constraints due to complex control flow [8]. Other projects have attempted to optimize the memory access network for custom hardware by either optimizing for the most frequent accesses [31], partitioning and distributing memory [25], or incorporating cache-like structures [32], [5]. Tartan, a reconfigurable architecture for spatial computation was also developed [33].…”

Section: Related Workmentioning

confidence: 99%

A New Dataflow Compiler IR for Accelerating Control-Intensive Code in Spatial Hardware

Zaidi

Greaves

2014

2014 IEEE International Parallel &Amp; Distributed Processing Symposium Workshops

View full text Add to dashboard Cite

Abstract-While custom (and reconfigurable) computing can provide orders-of-magnitude improvements in energy efficiency and performance for many numeric, data-parallel applications, performance on non-numeric, sequential code is often worse than what is achievable using conventional superscalar processors. This work attempts to address the problem of improving sequential performance in custom hardware by (a) switching from a statically scheduled to a dynamically scheduled (dataflow) execution model, and (b) developing a new compiler IR for highlevel synthesis that enables aggressive exposition of ILP even in the presence of complex control flow. This new IR is directly implemented as a static dataflow graph in hardware by our prototype high-level synthesis tool-chain, and shows an average speedup of 1.13× over equivalent hardware generated using LegUp, an existing HLS tool. In addition, our new IR allows us to further trade area and energy for performance, increasing the average speedup to 1.55×, through loop unrolling, with a peak speedup of 4.05×. Our custom hardware is able to approach the sequential cycle counts of an Intel Nehalem Core i7 superscalar processor, while consuming on average only 0.25× the energy of an in-order Altera Nios IIf processor.

show abstract

Hardware compilation of application-specific memory-access interconnect

Cited by 8 publications

References 38 publications

Elastic Circuits

Elastic Circuits

ASTRO: Synthesizing application-specific reconfigurable hardware traces to exploit memory-level parallelism

A New Dataflow Compiler IR for Accelerating Control-Intensive Code in Spatial Hardware

Contact Info

Product

Resources

About