Star Superscalar's (StarSs) programming model converts a sequential application in C or Fortran into an efficient parallel program. The resulting parallel code is highly dynamic in the sense that data analysis and task scheduling occur at run-time, while the application executes. In this paper we compare this approach to the strategy adopted by other multicore programming environments. The prize to pay for dynamic scheduling and dependence tracking is higher runtime overhead. We propose a distributed scheduler for Task Dependence Graphs (TDGs) to attenuate the scheduling cost in heterogeneous multicore architectures. This scheduler allows the cores to speculatively select tasks from a conservative estimate of the TDG. In case of conflicts or lack of tasks a lightweight centralized scheduler services the faulting core after which the latter resumes its participation in the distributed scheme. Experiments with Cell Superscalar (CellSs) on a representative set of benchmarks demonstrate the reduction in runtime overhead achieved by the distributed scheduler. This reduction in runtime overhead carries over directly to a performance improvement for a large fraction of the benchmarks.