Abstract-Many parallel algorithms use hypercubes as the communication topology among their processes. When such algorithms are executed on hypercube multicomputers the communication cost is kept minimum since processes can be allocated to processors in such a way that only communication between neighbor processors is required. However, the scalability of hypercube multicomputers is constrained by the fact that the interconnection cost-per-node increases with the total number of nodes. From scalability point of view, meshes and toruses are more interesting classes of interconnection topologies. This paper focuses on the execution of algorithms with hypercube communication topology on multicomputers with mesh or torus interconnection topologies. The proposed approach is based on looking at different embeddings of hypercube graphs onto mesh or torus graphs. The paper concentrates on toruses since an already known embedding, which is called standard embedding, is optimal for meshes. In this paper, an embedding of hypercubes onto toruses of any given dimension is proposed. This novel embedding is called xor embedding. The paper presents a set of performance figures for both the standard and the xor embeddings and shows that the latter outperforms the former for any torus. In addition, it is proven that for a one-dimensional torus (a ring) the xor embedding is optimal in the sense that it minimizes the execution time of a class of parallel algorithms with hypercube topology. This class of algorithms is frequently found in real applications, such as FFT and some class of sorting algorithms.
A systematic method to map systolizable problems onto multicomputers is presented in this paper. A systolizable problem is a problem for which it is possible to design a Systolic Algorithm. This method selects and\ud
transforms the Systolic Algorithm into a parallel algorithm with high granularity. The communications requirements are reduced and thus the performance can be increased. The proposed scheme requires a classification\ud
of dependences, and it is based in the interleaved execution of several partitions of the Systolic Algorithm. The code to be executed in a processing element of the multicomputer system is obtained through application of the proposed systematic transformations to the original sequential code. By applying this method to the APP we illustrate their main features, and several performance measures for a torus of transputers system are presented, considering the various algorithms which are unified by the APP.Postprint (published version
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.