Multiprocessor scheduling in a shared multiprogramming environment is often structured as two-level scheduling, where a kernellevel job scheduler allots processors to jobs and a user-level task scheduler schedules the work of a job on the allotted processors. In this context, the number of processors allotted to a particular job may vary during the job's execution, and the task scheduler must adapt to these changes in processor resources. For overall system efficiency, the task scheduler should also provide parallelism feedback to the job scheduler to avoid the situation where a job is allotted processors that it cannot use productively.We present an adaptive task scheduler for multitasked jobs with dependencies that provides continual parallelism feedback to the job scheduler in the form of requests for processors. Our scheduler guarantees that a job completes near optimally while utilizing at least a constant fraction of the allotted processor cycles. Our scheduler can be applied to schedule data-parallel programs, such as those written in High Performance Fortran (HPF), *Lisp, C*, NESL, and ZPL.Our analysis models the job scheduler as the task scheduler's adversary, challenging the task scheduler to be robust to the system environment and the job scheduler's administrative policies. For example, the job scheduler can make available a huge number of processors exactly when the job has little use for them. To analyze the performance of our adaptive task scheduler under this stringent adversarial assumption, we introduce a new technique called "trim analysis," which allows us to prove that our task scheduler performs poorly on at most a small number of time steps, exhibiting nearoptimal behavior on the vast majority.To be precise, suppose that a job has work T1 and critical-path length T∞ and is running on a machine with P processors. Using trim analysis, we prove that our scheduler completes the job in O(T1/ P + T∞ + L lg P ) time steps, where L is the length of a scheduling quantum and P denotes the O(T∞ + L lg P )-trimmed availability. This quantity is the average of the processor availabilThis research was supported in part by the Singapore-MIT Alliance and NSF Grant ACI-0324974. Yuxiong He is a Visiting Scholar at MIT CSAIL and a Ph.D. candidate at the National University of Singapore. Wen Jing Hsu is a Visiting Scientist at MIT CSAIL and Associate Professor at Nanyang Technological University.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ity over all time steps excluding the O(T∞ + L lg P ) time steps with the highest processor availability. When T1/T∞ P (the job's parallelism dominates the O(T∞ + L lg P )-trimmed availability), the job ach...