Abstract-Current Operating Systems (OS) perceive the different contexts of Simultaneous Multithreaded (SMT) processors as multiple independent processing units, although, in reality, threads executed in these units compete for the same hardware resources. Furthermore, hardware resources are assigned to threads implicitly as determined by the SMT instruction fetch (Ifetch) policy, without the control of the OS. Both factors cause a lack of control over how individual threads are executed, which can frustrate the work of the job scheduler. This presents a problem for general purpose systems, where the OS job scheduler cannot enforce priorities, and also for embedded systems, where it would be difficult to guarantee worst-case execution times. In this paper, we propose a novel strategy that enables a two-way interaction between the OS and the SMT processor and allows the OS to run jobs at a certain percentage of their maximum speed, regardless of the workload in which these jobs are executed. In contrast to previous approaches, our approach enables the OS to run time-critical jobs without dedicating all internal resources to them so that non-time-critical jobs can make significant progress as well and without significantly compromising overall throughput. In fact, our mechanism, in addition to fulfilling OS requirements, achieves 90 precent of the throughput of one of the best currently known fetch policies for SMTs.
Abstract-We present Task Superscalar, an abstraction of instruction-level out-of-order pipeline that operates at the tasklevel. Like ILP pipelines, which uncover parallelism in a sequential instruction stream, task superscalar uncovers tasklevel parallelism among tasks generated by a sequential thread. Utilizing intuitive programmer annotations of task inputs and outputs, the task superscalar pipeline dynamically detects intertask data dependencies, identifies task-level parallelism, and executes tasks out-of-order.Furthermore, we propose a design for a distributed task superscalar pipeline frontend, that can be embedded into any manycore fabric, and manages cores as functional units.We show that our proposed mechanism is capable of driving hundreds of cores simultaneously with non-speculative tasks, which allows our pipeline to sustain work windows consisting of tens of thousands of tasks. We further show that our pipeline can maintain a decode rate faster than 60ns per task and dynamically uncover data dependencies among as many as ∼50,000 in-flight tasks, using 7MB of on-chip eDRAM storage. This configuration achieves speedups of 95-255x (average 183x) over sequential execution for nine scientific benchmarks, running on a simulated CMP with 256 cores.Task superscalar thus enables programmers to exploit manycore systems effectively, while simultaneously simplifying their programming model.
Fetch performance is a very important factor because it effectively limits the overall processor performance. However,
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.