Abstract. Heterogeneity complicates the efficient use of multicomputer platforms, but does it enhance their performance? their cost effectiveness? How can one measure the power of a heterogeneous assemblage of computers ("cluster," for short), both in absolute terms (how powerful is this cluster) and relative terms (which cluster is the most powerful)? What makes one cluster more powerful than another? Is one better off with a cluster that has one super-fast computer and the rest of just "average" speed or with a cluster all of whose computers are "moderately" fast? If you could replace just one computer in your cluster with a faster one, which computer would you choose: the fastest? the slowest? How does one even ask questions such as these in a formal, yet tractable manner? A framework is proposed, and some answers are derived, a few rather surprising. Three highlights: (1) If one can replace only one computer in a cluster by a faster one, it is provably (almost) always most advantageous to replace the fastest one. (2) If the computers in two clusters have the same mean speed, then, empirically, the cluster with the larger variance in speed is (almost) always the faster one. (3) Heterogeneity can actually lend power to a cluster!
Motivation and BackgroundModern multicomputer platforms are heterogeneous: their constituent computers vary in computational powers, and they often intercommunicate over layered networks of varying speeds [12]. One observes substantial heterogeneity in modern platforms such as: clusters [2,21]; modalities of Internet-based computing [20] such as grid computing [9,14], global computing [11], volunteer computing [16], and cloud computing [10]. The difficulty of scheduling complex computations on heterogeneous platforms greatly complicates the challenge of high performance computing in modern environments. In 1994, the first author noted the need for better understanding of the scheduling implications of heterogeneity via rigorous analyses [23]. There has since been an impressive amount of first-rate work on this topic-focusing largely on collective communication [3,4,8,15,17,22,24], but also studying important scheduling issues [1,5,6,7,13,18]. That said, sources such as [1] show that there is still much to learn about this important topic-including the questions in the abstract.--------This research was supported in part by NSF Grants CNS-0615170 and CNS-0905399.