Data parallel applications are being extensively deployed in cloud environments because of the possibility of dynamically provisioning storage and computation resources. To identify cost-effective solutions that satisfy the desired service levels, resource provisioning and scheduling play a critical role. Nevertheless, the unpredictable behavior of cloud performance makes the estimation of the resources actually needed quite complex. In this paper we propose a provisioning and scheduling framework that explicitly tackles uncertainties and performance variability of the cloud infrastructure and of the workload. This framework allows cloud users to estimate in advance, i.e., prior to the actual execution of the applications, the resource settings that cope with uncertainty. We formulate an optimization problem where the characteristics not perfectly known or affected by uncertain phenomena are represented as random variables modeled by the corresponding probability distributions. Provisioning and scheduling decisions-while optimizing various metrics, such as monetary leasing costs of cloud resources and application execution timetake fully account of uncertainties encountered in cloud environments. To test our