Cloud Computing is emerging as a major trend in ICT industry. However, as with any new technology, new major challenges lie ahead, one of them concerning the resource provisioning. Indeed, modern Cloud applications deal with a dynamic context that requires a continuous adaptation process in order to meet satisfactory Quality of Service (QoS) but even the most titled Cloud platform provide just simple rule-based tools; the rudimentary autoscaling mechanisms that can be carried out may be unsuitable in many situations as they do not prevent SLA violations, but only react to them. In addition, these approaches are inherently static and cannot catch the dynamic behavior of the application and are unsuitable to manage multi-Cloud/data center deployments required for mission critical services. This situation calls for advanced solutions designed to provide Cloud resources in a predictive and dynamic way. This work presents capacity allocation algorithms, whose goal is to minimize the total execution cost while satisfying some constraints on the average response time of multi-Cloud based applications. This paper proposes a joint load balancing and receding horizon capacity allocation techniques, which can be employed to handle multiple classes of requests. An extensive evaluation of the proposed solution against an Oracle with perfect knowledge of the future and well-known heuristics proposed in the literature is provided. The analysis shows that our solution outperforms the heuristics producing results very close to the optimal ones, and reducing the number of QoS violations (in the worst case QoS constraints violation rate is 4.259% versus up to 17.245% of other approaches and can easily reduced by roughly a factor of four by exploiting the receding horizon approach). Furthermore, a sensitivity analysis over two different time scales indicates that finer grained time scales are more appropriate for spiky workloads. Analytical results are validated through simulation, which also analyzes the impact of Cloud environment random perturbations. Finally, experiments on a prototype environment demonstrate the effectiveness of the proposed approach under real workloads.