2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) 2020
DOI: 10.1109/hpca47549.2020.00039
|View full text |Cite
|
Sign up to set email alerts
|

CASINO Core Microarchitecture: Generating Out-of-Order Schedules Using Cascaded In-Order Scheduling Windows

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 10 publications
(9 citation statements)
references
References 63 publications
0
9
0
Order By: Relevance
“…We simulate the graph algorithms in two iterations where the first iteration is used to warm up the caches, and we report performance for the second iteration obtained through detailed simulation. Also, we skip the initialization and preprocessing steps during the formation of the graph using an in-built graph generator with a size of 2 18 nodes, formed according to the Kronecker distribution satisfying the Graph500 specifications.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We simulate the graph algorithms in two iterations where the first iteration is used to warm up the caches, and we report performance for the second iteration obtained through detailed simulation. Also, we skip the initialization and preprocessing steps during the formation of the graph using an in-built graph generator with a size of 2 18 nodes, formed according to the Kronecker distribution satisfying the Graph500 specifications.…”
Section: Methodsmentioning
confidence: 99%
“…Shioya et al [35] propose the front-end execution architecture which executes instructions that have their operands ready in the front-end of the pipeline; other non-ready instructions are dispatched to the out-of-order back-end. CASINO [18] pursues a similar goal by augmenting an in-order core with an additional speculative queue from which ready instructions are executed ahead of a traditional in-order instruction queue. CASINO adds significant complexity over an in-order core because of the CAM-based selection logic in the speculative queue and dynamic memory disambiguation.…”
Section: Related Workmentioning
confidence: 99%
“…We conducted simulation using gem5 [6] which is configured with 2-issue, 2.5GHz dual-core processor with 32KB/64KB 2-way set-associative L1 instruction/data caches (2 cycles hit) and a unified 128KB 16-way set-associative L2 cache (20 cycles hit) to model an ARM Cortex-A53 processor [2]. The store buffer size is set to 4 as with the recent work that simulates the Cortex-A53 core [28], and the default CLQ size is 2. According to prior works [67][68][69][70], 300-30 deployed acoustic sensors can achieve 10-30 cycles of the worst-case detection latency (WCDL) with the area cost of less than 1% of die size, and therefore we set the default WCDL to 10 cycles.…”
Section: Implementation and Evaluation 61 Methodologymentioning
confidence: 99%
“…As the OoO queue handles fewer instructions, FIFOrder reduces its depth and width, thus reducing the scheduling energy cost. Another recent architecture, CASINO core [16], also targets ready instructions to simplify instruction scheduling.…”
Section: Energy-efficient Core Designmentioning
confidence: 99%