Many scheduling algorithms have been devised for nested loops with and without dependencies on general heterogeneous distributed systems ([1] and references therein). However, none addressed the case of dynamically computing and allocating chunks of nonindependent tasks to processors. We propose a theoretical model that results in a function that estimates the parallel time of tasks in loops with dependencies on heterogeneous systems. We show that the minimum parallel time is obtained with a synchronization frequency that minimizes the function giving the parallel time. The accuracy of the model is validated through experiments from a practical application. For more details refer to [2].To find the optimal synchronization frequency, we build a theoretical model for heterogeneous dedicated systems, in which workers have different computational powers. Loops with dependencies are efficiently scheduled on heterogeneous systems with selfscheduling algorithms [1]. The self-scheduling algorithms are based on the master-worker model. The master assigns work to workers upon request. Due to the data dependencies, applying self-scheduling algorithms to loops with dependencies yields a pipelined parallel execution. In the case of one master and N P workers, each assignment round corresponds to a pipeline with N P stages. Our approach assumes that the nested loop is represented in Cartesian space with at least 2 dimensions. One dimension is partitioned by the master into chunks according to a self-scheduling algorithm. In a pipeline organization, each worker synchronizes with its neighbors. Thus, synchronization points are inserted along the other dimension. A synchronization interval, denoted by h, represents the number of elements in the index space along the synchronization dimension. Data produced at the end of one pipeline are fed to the next pipeline. It is obvious that the synchronization frequency plays an important role in the total parallel time. Frequent synchronization implies excessive communication, whereas infrequent synchronization may limit the parallelism.In order to estimate the theoretical parallel time on a heterogeneous system for the case of multiple assignment rounds (pipelines), i.e., the number of processors is smaller than the total number of chunks, we assume that a problem of the original index space size can be decomposed into p subproblems (pipelines) of (equal) size in which each processor is assigned one chunk. Thus, one subproblem corresponds to one assignment round. These subproblems are inter-dependent in the sense that (part of) the data produced by one subproblem are consumed by the next subproblem. Upon completion of one subproblem, the processor assigned the last chunk of the subproblem transmits (in a single message) all necessary data to the processor assigned the first chunk of the next subproblem. The time to complete this data transfer, represents the time to send and receive a data packet of size equal to the size of the synchronization dimension.Hence, the theoretical parallel ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.