Mark Smotherman scite author profile

Multiple issue of instructions occurs in superscalar and VLIW machines. This paper investigates a third type of machine design, which combines the advantages of code compatibility as in superscalars and the absence of complex dependency-checking logic from the decoder as in VLIW. In this design, a stream of scalar instructions is executed by the hardware and is simultaneously compacted into VLIW-type instructions, which are then stored in a structure called a shadow cache. When a shadow cache line contains the instructions requested by the fetch unit, the scalar instruction stream is preempted and all operations in the shadow cache line are simultaneously issued and executed. The mechanism that compacts instructions is called a ll unit, and was rst proposed for dynamically compacting microoperations into large executable units by Melvin, Shebanow, and Patt in 1988. We have extended their approach to directly handle data dependencies, delayed branches, and speculative execution (using branch prediction). This approach is evaluated using the MIPS architecture, and a sixfunctional-unit machine is found to be 52 to 108% faster than a single-issue processor for unrecompiled SPECint92 benchmarks.

show abstract

The hybrid automated reliability predictor

Dugan¹,

Trivedi²,

Smotherman³

et al. 1986

Journal of Guidance, Control, and Dynamics

114

View full text Add to dashboard Cite

Efficient DAG construction and heuristic calculation for instruction scheduling

Smotherman

Krishnamurthy

Aravind

et al. 1991

View full text Add to dashboard Cite

Automatic benchmark generation for cache optimization of matrix operations

McCalpin

Smotherman

1995

View full text Add to dashboard Cite

Computationally intensive algorithms must usually be restructured to make the best use of cache memory in current high-performance, hierarchical memory computers. Unfortunately, cache conscious algorithms are sensitive to object sizes and addresses as well as the details of the cache and translation lookaside buffer geometries, and this sensitivity makes both automatic restructuring and hand-tuning difficult tasks. An optimization approach is presented in this paper that automatically generates and executes a benchmark program from a concise specification of the algorithm's structure. This technique provides the performance data needed for verification of code generation heuristics or search among the various restructuring options. Matrix transpose and matrix multiplication are examined using this approach for several workstations with restructuring options of loop order, tiling (blocking), and unrolling.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mark Smotherman

A non-homogeneous Markov model for phased-mission reliability analysis

A fill-unit approach to multiple instruction issue

The hybrid automated reliability predictor

Efficient DAG construction and heuristic calculation for instruction scheduling

Automatic benchmark generation for cache optimization of matrix operations

Contact Info

Product

Resources

About