Thomas J. LeBlanc scite author profile

First-class user-level threads

Marsh

¹

,

Scott

²

,

LeBlanc

³

et al. 1991

View full text Add to dashboard Cite

It is often desirable, for reasons of clarity, portability, and efficiency, to write parallel programs in which the number of processes is independent of the number of available processors. Several modern operating systems support more than one process in an address space, but the overhead of creating and synchronizing kernel processes can be high. Many runtime environments implement lightweight processes (threads) in user space, but this approach usually results in second-class status for threads, making it difficult or impossible to perform scheduling operations at appropriate times (e.g. when the current thread blocks in the kernel). In addition, a lack of common assumptions may also make it difficult for parallel programs or library routines that use dissimilar thread packages to communicate with each other, or to synchronize access to shared data.We describe a set of kernel mechanisms and conventions designed to accord first-class status to user-level threads, allowing them to be used in any reasonable way that traditional kernel-provided processes can be used, while leaving the details of their implementation to userlevel code. The key features of our approach are (1) shared memory for asynchronous communication between the kernel and the user, (2) software interrupts for events that might require action on the part of a user-level scheduler, and (3) a scheduler interface convention that facilitates interactions in user space between dissimilar kinds of threads. We have incorporated these mechanisms in the Psyche parallel operating system, and have used them to implement several different kinds of user-level threads. We argue for our approach in terms of both flexibility and performance.

show abstract

Parallel performance prediction using lost cycles analysis

Crovella¹,

LeBlanc²

View full text Add to dashboard Cite

Most performance debugging and tuning of parallel programs is based on the \measure-modify" approach, which is heavily dependent on detailed m e asurements of programs during execution. This approach is extremely time-consuming and does not lend itself to predicting performance under varying conditions. Analytic modeling and scalability analysis provide predictive power, but are not widely used i n p r actice, due primarily to their emphasis on asymptotic behavior and the di culty of developing accurate models that work for real-world programs. In this paper we describe a s e t o f t o ols for performance tuning of parallel programs that bridges this gap between measurement and modeling.Our approach is based o n lost cycles analysis, which involves measurement and modeling of all sources of overhead in a parallel program. We rst describe a t o ol for measuring overheads in parallel programs that we have incorporated i n t o t h e runtime environment for Fortran programs on the Kendall Square KSR1. We then describe a t o ol that ts these overhead measurements to analytic forms. We illustrate the use of these tools by analyzing the performance t r adeo s among parallel implementations of 2D FFT. These examples show how our tools enable programmers to develop accurate performance m o dels of parallel applications without requiring extensive performance m o deling expertise.

show abstract

First-class user-level threads

Marsh

¹

,

Scott

²

,

LeBlanc

³

et al. 1991

SIGOPS Oper. Syst. Rev.

View full text Add to dashboard Cite

It is often desirable, for reasons of clarity, portability, and efficiency, to write parallel programs in which the number of processes is independent of the number of available processors. Several modern operating systems support more than one process in an address space, but the overhead of creating and synchronizing kernel processes can be high. Many runtime environments implement lightweight processes (threads) in user space, but this approach usually results in second-class status for threads, making it difficult or impossible to perform scheduling operations at appropriate times (e.g. when the current thread blocks in the kernel). In addition, a lack of common assumptions may also make it difficult for parallel programs or library routines that use dissimilar thread packages to communicate with each other, or to synchronize access to shared data.We describe a set of kernel mechanisms and conventions designed to accord first-class status to user-level threads, allowing them to be used in any reasonable way that traditional kernel-provided processes can be used, while leaving the details of their implementation to userlevel code. The key features of our approach are (1) shared memory for asynchronous communication between the kernel and the user, (2) software interrupts for events that might require action on the part of a user-level scheduler, and (3) a scheduler interface convention that facilitates interactions in user space between dissimilar kinds of threads. We have incorporated these mechanisms in the Psyche parallel operating system, and have used them to implement several different kinds of user-level threads. We argue for our approach in terms of both flexibility and performance.

show abstract

Multiprogramming on multiprocessors

Crovella

¹

,

Das

²

,

Dubnicki

³

et al.

View full text Add to dashboard Cite

A software instruction counter

Mellor-Crummey

¹

,

LeBlanc

²

1989

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

Although several recent papers have proposed architectural support for program debugging and profiling, most processors do not yet provide even basic facilities, such as an instruction counter. As a result, system developers have been forced to invent software solutions. This paper describes our implementation of a software instruction counter for program debugging. We show that an instruction counter can be reasonably implemented in software, often with less than 10% execution overhead. Our experience suggests that a hardware instruction counter is not necessary for a practical implementation of watch-points and reverse execution, however it will make program instrumentation much easier for the system developer.

show abstract

Thomas J. LeBlanc

First-class user-level threads

Parallel performance prediction using lost cycles analysis

First-class user-level threads

Multiprogramming on multiprocessors

A software instruction counter

Contact Info

Product

Resources

About