U n i v e r s i t y of Colorado C a m p u s Box 430, Boulder, C O 8 0 3 0 9 -0 4 3 0 g r u n w a l d @ c s , c o l o r a d o , e d u Richard Neves RO. B o x 218 IBM. T. J. W a t s o n Research Yorktown Heights, N Y 10598r n e v e s @ w a t son. ibm. c o m
AbstractModem languages and operating systems often encourage programmers to use threads, or independent control streams, to mask the overhead of some operations and simplify program structure. Multitasking operating systems use threads to mask communication latency, either with hardwares devices or users. Client-server applications typically use threads to simplify the complex controlflow that arises when multiple clients are used. Recently, the scientific computing community has started using threads to mask network communication latency in massively parallel architectures, allowing computation and communication to be overlapped. Lastly, some architectures implement threads in hardware, using those threads to tolerate memory latency.In general, it would be desirable if threaded programs could be written to expose the largest degree of parallelism possible, or to simplify the program design. However, threads incur time and space overheads, and programmers often compromise simple designs for performance. In this paper, we show how to reduce time and space thread overhead using control flow and register liveness information inferred after compilation. Our techniques work on binaries, are not specific to a particular compiler or thread library and reduce the the overall execution time of fine-grain threaded programs by 1 5 -30%. We use execution-driven analysis and an instrumented operating system to show why the execution time is reduced and to indicate areas for future work.Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or d stributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice s g van that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. ASPLOS VII 10/96 MA, USA © 1996 ACM 0-89791-767-7/96/0010...$3.50 are primarily interested in scientific applications, the techniques we describe may be applicable to a wide range of application domains including threaded databases, client-server applications, and inkernel operating system threads.