Because of the increasing number of cores of current parallel machines and the growing need for a concurrent execution of tasks, the problem of parallel task scheduling is more relevant than ever, especially under the moldable task model, in which tasks are allocated to a fixed number of processors before execution. Much research has been conducted to develop efficient scheduling algorithms for moldable tasks, both in theory and practice. The problem is that theoretical and practical approaches expose shortcomings, for example, many approximation algorithms only guarantee bounds under assumptions, which are unrealistic in practice, or most heuristics have not been rigorously compared with competing approximation algorithms. In particular, it is often assumed that the speedup function of moldable tasks is either non-decreasing, sublinear, or concave. In practice, however, the resulting speedup of parallel programs on current hardware with deep memory hierarchies is most often neither non-decreasing nor concave. We present a new algorithm for the problem of scheduling moldable tasks with precedence constraints for the makespan objective and for arbitrary speedup functions. We show through simulation that the algorithm not only creates competitive schedules for moldable tasks with arbitrary speedup functions but also outperforms other published heuristics and approximation algorithms for non-decreasing speedup functions.We have conducted three different simulation studies, which are part of the overall evaluation. First, we compare the schedules produced by CPA13 and its competitors for the non-increasing and convex run-time model. Second, we address the scalability of these algorithms by increasing the number