2015
DOI: 10.1007/978-3-319-21909-7_38
|View full text |Cite
|
Sign up to set email alerts
|

Toward a Core Design to Distribute an Execution on a Manycore Processor

Abstract: Abstract. This paper presents a parallel execution model and a manycore processor design to run C programs in parallel. The model automatically builds parallel sections of machine instructions from the run trace. It parallelizes instructions fetches, renamings, executions and retirements. Predictor based fetch is replaced by a fetch-decode-and-partlyexecute stage able to compute in-order most of the control instructions. Tomasulo's register renaming is extended to memory with a technique to match consumer/prod… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2016
2016
2017
2017

Publication Types

Select...
1
1

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(10 citation statements)
references
References 15 publications
0
10
0
Order By: Relevance
“…To avoid complications in the trace building, the hardware in [5] computes the control instructions targets rather than predicting them. Computing is slower than predicting but computing tens of branches in parallel is more efficient than predicting tens of 1 Instruction Set Architecture branches in sequence, parallelism being more cost-effective than a sequential predictor, even a perfect one.…”
Section: A Deterministic and Parallel Run Of C Code 21 Deterministicmentioning
confidence: 99%
See 4 more Smart Citations
“…To avoid complications in the trace building, the hardware in [5] computes the control instructions targets rather than predicting them. Computing is slower than predicting but computing tens of branches in parallel is more efficient than predicting tens of 1 Instruction Set Architecture branches in sequence, parallelism being more cost-effective than a sequential predictor, even a perfect one.…”
Section: A Deterministic and Parallel Run Of C Code 21 Deterministicmentioning
confidence: 99%
“…x86 register rsp), meaning that both paths use the same stack area. The hardware in [5] also copies rbp, rdi, rsi and rbx. These copies are better than push/pop because a push in a function prologue and a pop in its epilogue create RAW dependences between the epilogue and the prologue of the next function call, serializing them.…”
Section: The Fork Machine Instructionmentioning
confidence: 99%
See 3 more Smart Citations