SummaryNested loops are the largest source of parallelism in many data-parallel scientific applications.Heterogeneous distributed systems are popular computing platforms for data-parallel applications. Data partitioning is critical in exploiting the computational power of such systems, and existing data partitioning algorithms try to maximize performance of data-parallel applications by finding a data distribution that balances the workload between the processing nodes while minimizing communication costs. This paper addresses the problem of 3-dimensional data partitioning for 3-level perfectly nested loops on heterogeneous distributed systems. The primary aim is to minimize the execution time by improving the load balancing and minimizing the internode communications. We propose a new data partitioning algorithm using dynamic programming, build a theoretical model to estimate the execution time of each partition, and select a partition with minimum execution time as a near-optimal solution. We demonstrate the effectiveness of the new algorithm for 2 data-parallel scientific applications on heterogeneous distributed systems. The new algorithm reduces the execution time by between 7% and 17%, on average, compared with leading data partitioning methods on 3 heterogeneous distributed systems. architecture and the need to meet ever-increasing computing needs of scientific applications, the computational capacity of the clusters is often increased. 8,9 As a first approach, we could replace all processing nodes with newer, faster ones. In this case, the cluster remains homogeneous over time, but a complete replacement of all nodes can be very costly. As a second approach, we could upgrade the cluster by adding more processing nodes that use a newer technology with higher speed. Also, we could aggregate several clusters together to use their computational power for solving computing problems. 8 Another approach is adding graphics processing units (GPUs) to improve performance of existing nodes. 9 In the latter 3 cases, the cluster becomes heterogeneous. [10][11][12] In this way, heterogeneous computing systems have emerged as an important contribution to provide computational capacity in high-performance computing. In fact, the prevalence of heterogeneous systems in the TOP500 list grew from 3.4% to 18.0%