Abstract. This paper presents a parallel execution model and a manycore processor design to run C programs in parallel. The model automatically builds parallel sections of machine instructions from the run trace. It parallelizes instructions fetches, renamings, executions and retirements. Predictor based fetch is replaced by a fetch-decode-and-partlyexecute stage able to compute in-order most of the control instructions. Tomasulo's register renaming is extended to memory with a technique to match consumer/producer pairs. The Reorder Buffer is adapted to allow parallel retirement. The model is presented on a sum reduction example which is also used to give a short analytical evaluation of the model performance potential.
Summary This paper presents an alternative method to parallelize programs, better suited to manycore processors than actual operating system–/API‐based approaches like OpenMP and MPI. The method relies on a parallelizing hardware and an adapted programming style. It frees and captures the instruction‐level parallelism (ILP). A many‐core design is presented in which cores are multithreaded and able to fork new threads. The programming style is based on functions. The hardware creates a concurrent thread at each function call. The programming style and the hardware create the conditions to free the ILP, by eliminating the architectural dependences between a call and its continuation after return. We illustrate the method on a sum reduction, a matrix multiplication and a sort. We measure the ILP of the parallel runs and show that it is high enough to feed thousands of cores because it increases with data size. We compare our method to pthread parallelization, showing that (1) our parallel execution is deterministic, (2) our thread management is cheap, (3) our parallelism is implicit, and (4) our method parallelizes functions and loops. Implicit parallelism makes parallel code easy to write and read. Deterministic parallel execution makes parallel code easy to debug.
International audienceThis paper presents a new distributed computation model adapted to manycore processors. In this model, the run is spread on the available cores by fork machine instructions produced by the compiler , for example at function calls and loops iterations. This approach is to be opposed to the actual model of computation based on cache and predictor. Cache efficiency relies on data locality and predictor efficiency relies on the reproducibility of the control. Data locality and control reproducibility are less effective when the execution is distributed. The computation model proposed is based on a new core hardware. Its main features are described in this paper. This new core is the building block of a manycore design. The processor automatically parallelizes an execution. It keeps the computation deterministic by constructing a totally ordered trace of the machine instructions run. References are renamed, including memory , which fixes the communications and synchronizations needs. When a data is referenced, its producer is found in the trace and the reader is synchronized with the writer. This paper shows how a consumer can be located in the same core as its producer, improving parallel locality and parallelization quality. Our determin-istic and fine grain distribution of a run on a manycore processor is compared with OS primitives and API based parallelization (e.g. pthread, OpenMP or MPI) and to compiler automatic paralleliza-tion of loops. The former implies (i) a high OS overhead meaning that only coarse grain parallelization is cost-effective and (ii) a non deterministic behaviour meaning that appropriate synchronization to eliminate wrong results is a challenge. The latter is unable to fully parallelize general purpose programs due to structures like functions, complex loops and branches
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.