2013 International Conference on Field-Programmable Technology (FPT) 2013
DOI: 10.1109/fpt.2013.6718409
|View full text |Cite
|
Sign up to set email alerts
|

Efficient methods for out-of-order load/store execution for high-performance soft processors

Abstract: Abstract-As FPGAs continue to increase in size, it becomes increasingly feasible and desirable to build higher performance soft processors. Preserving the familiar single-threaded programming model can be done with an out of order processor. The ability to execute memory loads and stores out of order has a large impact on performance, but this is difficult to do because the dependencies between stores and loads are not known until addresses are computed. Out of order memory disambiguation is traditionally done… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
0
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 11 publications
2
0
0
Order By: Relevance
“…These scheme is also impractical, as the LSQ would have as many entries as the number of static memory accesses in the application, which would be unrealistically large in all but the most trivial of cases. Others have shown that both the critical path (assuming single-cycle accesses) and resource requirement demands grow as a function of the number of LSQ entries [21]; our implementation results confirm this observation.…”
Section: Supplying a Sequential Order To The Lsqsupporting
confidence: 86%
See 1 more Smart Citation
“…These scheme is also impractical, as the LSQ would have as many entries as the number of static memory accesses in the application, which would be unrealistically large in all but the most trivial of cases. Others have shown that both the critical path (assuming single-cycle accesses) and resource requirement demands grow as a function of the number of LSQ entries [21]; our implementation results confirm this observation.…”
Section: Supplying a Sequential Order To The Lsqsupporting
confidence: 86%
“…Although our design exhibits ample parallelism and performs most operations concurrently, some functionalities cannot be implemented in constant time-for instance, to bypass data from the store to the load queue, one needs to check the store queue from the head to the tail to find the last conflicting data, and this requires at best O(log n) time for an n-depth queue. This sensitivity to the number of queue entries is in line with results reported by others in conventional LSQ designs-previous efforts to implement conventional LSQs in FPGAs have exhibited the same trends of resource and clock degradation with queue size [21]. These results motivate us to consider alternative design options in the future-our group allocation policy is generally applicable and can be incorporated into different queue architectures.…”
Section: Resource Utilization and Timing Analysissupporting
confidence: 85%
“…To avoid pipeline stalls due to unpredictable memory accesses, a circuit can use additional logic to handle memory accesses at runtime [20]. If proven safe to do so, the logic should allow loads from later loop iterations to be executed without waiting for stores from earlier iterations to commit.…”
Section: Runtime Memory Disambiguation In Hlsmentioning
confidence: 99%
“…Such functionality is most often implemented as a load-store queue (LSQ). Most LSQs aimed at HLS use a content-addressable memory (CAM) structure to implement the load and store queue [20], [23], with a similar operating principle as LSQs used in out-of-order CPUs [24]. CAMs map poorly to FPGA technology resulting in a high critical path and resource usage overhead [20], [25].…”
Section: Runtime Memory Disambiguation In Hlsmentioning
confidence: 99%
See 1 more Smart Citation