2014 IEEE International Symposium on Parallel and Distributed Processing With Applications 2014
DOI: 10.1109/ispa.2014.29
|View full text |Cite
|
Sign up to set email alerts
|

Trace-Based Reconfigurable Acceleration with Data Cache and External Memory Support

Abstract: This paper presents a binary acceleration approach based on extending a General Purpose Processor (GPP) with a Reconfigurable Processing Unit (RPU), both sharing an external data memory. In this approach repeating sequences of GPP instructions are migrated to the RPU. The RPU resources are selected and organized off-line using execution trace information. The RPU core is composed of Functional Units (FUs) that correspond to single CPU instructions. The FUs are arranged in stages of mutually independent operati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 20 publications
0
4
0
Order By: Relevance
“…For earlier row-oriented 2D RPUs, the translation was based on directly implementing the set of CDFGs as a configurable datapath. We started in [7] without support for pipelining, and added loop pipelining support in [9]. Our recent RPU is a 1D architecture and executes modulo scheduled loops.…”
Section: B Mapping Stagesmentioning
confidence: 99%
See 2 more Smart Citations
“…For earlier row-oriented 2D RPUs, the translation was based on directly implementing the set of CDFGs as a configurable datapath. We started in [7] without support for pipelining, and added loop pipelining support in [9]. Our recent RPU is a 1D architecture and executes modulo scheduled loops.…”
Section: B Mapping Stagesmentioning
confidence: 99%
“…We have considered different system organizations. The system in [9] (Fig. 3(a)) uses local memory for code, external memory for data, and a custom dual-port cache for the RPU, which can access the full range of the GPP's data.…”
Section: A System Level Architecturementioning
confidence: 99%
See 1 more Smart Citation
“…Results obtained with a prototype implementation show that the approach is viable and can be used effectively to handle arbitrary hotspot functions, not just those located in shared library routines. Moreover, as discussed in Section 4, the approach can be extended to handle hotspots that are not necessarily subroutines of the original code (such as the "megablocks" of [16,17]).…”
mentioning
confidence: 99%