SUMMARY
Due to its advantages of cost‐effectiveness, on‐demand provisioning and easy for sharing, cloud computing has grown in popularity with the research community for deploying scientific applications such as workflows. Although such interests continue growing and scientific workflows are widely deployed in collaborative cloud environments that consist of a number of data centers, there is an urgent need for exploiting strategies which can place application datasets across globally distributed data centers and schedule tasks according to the data layout to reduce both latency and makespan for workflow execution. In this paper, by utilizing dependencies among datasets and tasks, we propose an efficient data and task coscheduling strategy that can place input datasets in a load balance way and meanwhile, group the mostly related datasets and tasks together. Moreover, data staging is used to overlap task execution with data transmission in order to shorten the start time of tasks. We build a simulation environment on Tianhe supercomputer for evaluating the proposed strategy and run simulations by random and realistic workflows. The results demonstrate that the proposed strategy can effectively improve scheduling performance while reducing the total volume of data transfer across data centers. Concurrency and Computation: Practice and Experience, 2013.© 2013 Wiley Periodicals, Inc.