D. Maskit scite author profile

D. Maskit

1Publication

29Citation Statements Received

6Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor

Keckler¹,

Dally²,

Maskit³

et al.

View full text Add to dashboard Cite

Modern computer systems extract parallelism from problems at two extremes of granularity: instruction-level parallelism (ILP) and coarse-thread parallelism. VLIW and superscalar processors exploit ILP with a grain size of a single instruction, while multiprocessors extract parallelism from coarse threads with a granularity of many thousands of instructions.The parallelism available at these two extremes is limited. The ILP in applications is restricted by control flow and data dependencies [17], and the hardware in superscalar designs is not scalable. Both the instruction scheduling logic and the register file of a superscalar grow quadratically as the number of execution units is increased. For multicomputers, there is limited coarse thread parallelism at small problem sizes and in many applications. 1The research described in this paper was supported by the Defense Advanced Research Projects Agency and monitored by the Air Force Electronic Systems Division under contract F19628-92-C-0045. This paper describes and evaluates the hardware mechanisms implemented in the MIT Multi-ALU Processor (MAP chip) for extracting fine-thread parallelism. Fine-threads close the parallelism gap between the single instruction granularity of ILP and the thousand instruction granularity of coarse threads by extracting parallelism with a granularity of 50-1000 instructions. This parallelism is orthogonal and complementary to coarse-thread parallelism and ILP. Programs can be accelerated using coarse threads to extract parallelism from outer loops and large co-routines, finethreads to extract parallelism from inner loops and small subcomputations, and ILP to extract parallelism from subexpressions. As they extract parallelism from different portions of a program, coarse-threads, fine-threads, and ILP work synergistically to provide multiplicative speedup.These three modes are also well matched to the architecture of modern multiprocessors. ILP is well suited to extracting parallelism across the execution units of a single processor. Finethreads are appropriate for execution across multiple processors at a single node of a parallel computer where the interaction latencies are on the order of a few cycles. Coarse-threads are appropriate for execution on different nodes of a multiprocessor where interaction latencies are inherently 100s of cycles.Low overhead mechanisms for communication and synchronization are required to exploit fine-grain thread level parallelism. The cost to initiate a task, pass it arguments, synchronize with its completion, and return results must be small compared to the work accomplished by the task. Such inter-thread interaction requires 100s of cycles (2 13 s) on conventional multiprocessors, and 1000s of cycles (2 103 s) on multicomputers. Because of these high overheads, most parallel applications use only coarse threads, with many thousands of instructions between interactions.The Multi-ALU Processor (MAP) chip provides three on-chip processors and methods for quickly communicating and synchronizing among ...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

D. Maskit

Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor

Contact Info

Product

Resources

About