Data Partitioning with a Functional Performance Model of Heterogeneous Processors

Lecture Notes in Computer Science

Reddy

2010

Self Cite

Abstract. The functional performance model (FPM) of heterogeneous processors has proven to be more realistic than the traditional models because it integrates many important features of heterogeneous processors such as the processor heterogeneity, the heterogeneity of memory structure, and the effects of paging. Optimal 1D matrix partitioning algorithms employing FPMs of heterogeneous processors are already being used in solving complicated linear algebra kernel such as dense factorizations. However, 2D matrix partitioning algorithms for parallel computing on heterogeneous processors based on their FPMs are unavailable. In this paper, we address this deficiency by presenting a novel iterative algorithm for partitioning a dense matrix over a 2D grid of heterogeneous processors and employing their 2D FPMs. Experiments with a parallel matrix multiplication application on a local heterogeneous computational cluster demonstrate the efficiency of this algorithm.

Section: Data Partitioning Algorithm (Dpa-fpm-2d)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Two-Dimensional Matrix Partitioning for Parallel Computing on Heterogeneous Processors Based on Their Functional Performance Models

Lecture Notes in Computer Science

Reddy

2010

Self Cite

“…It has been shown in [13] that it is more accurate to represent performance as a function of problem size, which reflects contributions from both processor and memory. In this paper, we propose a new dynamic load balancing algorithm based on partial functional performance models of processors [3].…”

Section: Related Workmentioning

confidence: 99%

“…Our dynamic load balancing algorithm is based on functional performance models [13], which are application centric and hardware specific. Functional performance models reflect both processor and memory heterogeneity.…”

mentioning

confidence: 99%

Dynamic Load Balancing of Parallel Computational Iterative Routines on Platforms with Memory Heterogeneity

Clarke

Euro-Par 2010 Parallel Processing Workshops

Рычков

2011

Self Cite

Abstract. Traditional load balancing algorithms for data-intensive iterative routines can successfully load balance relatively small problems. We demonstrate that they may fail for large problem sizes on computational clusters with memory heterogeneity. Traditional algorithms use too simplistic models of processors' performance which cannot reflect many aspects of heterogeneity. This paper presents a new dynamic load balancing algorithm based on the advanced functional performance model. The model consists of speed functions of problem size, which are built adaptively from a history of load measurements. Experimental results demonstrate that our algorithm can successfully balance data-intensive iterative routines on parallel platforms with memory heterogeneity.

“…This type of parallel application is often used in practice, for example, in processing of a large amount of image data collected from the hyperspectral sensors on airborne/satellite platforms (Plaza et al 2006). Our application multiplies two dense square matrices, C = A × B, and employs a simple heterogeneous parallel algorithm based on one-dimensional matrix partitioning (see, for example, Lastovetsky and Reddy 2007). As shown in Figure 5, the matrices A and C are horizontally sliced such that the number of elements in a slice is proportional to the speed of the processor owning the slice.…”

Section: N Estimation Of Communication Modelsmentioning

confidence: 99%

Accurate and Efficient Estimation of Parameters of Heterogeneous Communication Performance Models

The International Journal of High Performance Computing Applica

Рычков

2009

Self Cite

Analytical predictive communication models play an important role in the optimization of communication operations in scientific applications running on computational clusters. The effectiveness of this model-based optimization strongly depends on the accuracy of the estimation of the parameters of these models. The task of accurate estimation of the model is particularly challenging for heterogeneous communication models that use a much larger number of pointto-point parameters than their homogeneous counterparts. One particular challenge occurs when the number of pointto-point parameters describing communication between a pair of processors becomes larger than the number of independent point-to-point communication experiments traditionally used for estimation of the parameters. In this paper, we address this and other related issues and propose an approach that allows us to design a set of communication experiments sufficient for the accurate and efficient estimation of the parameters of a heterogeneous communication performance model. The experiments on heterogeneous clusters demonstrate the accuracy and efficiency of the proposed solution.