Data partitioning with a realistic performance model of networks of heterogeneous computers

Lastovetsky, Alexey; Reddy, Ravi

doi:10.1109/ipdps.2004.1303051

Cited by 33 publications

(51 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 5 (a) shows the speedup of the matrix-matrix multiplication executed on this network using the advanced model over the matrix-matrix multiplication using the modified version of the standard model that determines the speed of the processor based on the multiplication of two dense 500×500 matrices and two dense 4000×4000 matrices. For problem sizes beyond 24000, the figure shows that the distribution given by the performance model [1] will result in failure of the application. For these problem sizes, the modified performance model is used to obtain optimal distribution.…”

Section: Resultsmentioning

confidence: 99%

“…Some of the issues with programming applications on such networks of heterogeneous computers have been explained in [1]. These are mainly:…”

Section: Indexmentioning

confidence: 99%

“…In this case, the speed of the processors is more realistically represented by a continuous and relatively smooth function of the problem size. The performance model discussed in [1] can be used to efficiently schedule arbitrary tasks on such network of heterogeneous computers when one or more arbitrary tasks do not fit into the main memory of the processors. This model particularly addresses the problem of optimal data partitioning in heterogeneous environments when relative speeds of processors cannot be accurately approximated by constant functions of the problem size.…”

Section: Indexmentioning

confidence: 99%

“…For these problem sizes, any distribution obtained by this model will most likely either crash the processor whose speed is represented by the speed function s 2 (x) or result in unacceptable execution time to execute the subtask assigned to this processor. The advanced performance model retains the restrictions imposed by performance model [1] on the shape of the graph representing the speed function. However each processor is represented by its absolute speed as a continuous function of problem size only up till its upper bound on the problem size and beyond that, the absolute speed of the processor is assumed to be almost equal to zero.…”

Section: Data Partitioning With a Realistic Performance Model Of Netwmentioning

confidence: 99%

“…Partition the set such that the number of elements in each partition is proportional to the speed of the processor and assuming no upper bound exists on the number of elements that can be stored by the processor. The partitioning algorithm used to perform this task is discussed in [1]. If the number of elements in each partition assigned to each processor is less than the upper bound on the number of elements that can be stored by the processor, we have an optimal distribution.…”

Section: Algorithms For Partitioning Setsmentioning

confidence: 99%

See 4 more Smart Citations

Data Partitioning with a Realistic Performance Model of Networks of Heterogeneous Computers with Task Size Limits

Lastovetsky¹,

Reddy²

Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tool

Self Cite

View full text Add to dashboard Cite

Section: Resultsmentioning

confidence: 99%

“…Some of the issues with programming applications on such networks of heterogeneous computers have been explained in [1]. These are mainly:…”

Section: Indexmentioning

confidence: 99%

Section: Indexmentioning

confidence: 99%

Section: Data Partitioning With a Realistic Performance Model Of Netwmentioning

confidence: 99%

Section: Algorithms For Partitioning Setsmentioning

confidence: 99%

See 3 more Smart Citations

Data Partitioning with a Realistic Performance Model of Networks of Heterogeneous Computers with Task Size Limits

Lastovetsky¹,

Reddy²

Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tool

Self Cite

View full text Add to dashboard Cite

Design of self‐adaptable data parallel applications on multicore clusters automatically optimized for performance and energy through load distribution

Manumachu

Lastovetsky

2018

Concurrency and Computation

View full text Add to dashboard Cite

Self-adaptability is a highly preferred feature in HPC applications. A crucial building block of a self-adaptable application is a data partitioning algorithm that must possess several essential qualities apart from low runtime and memory costs. On modern platforms composed of multicore CPU processors, data partitioning algorithms striving to solve the bi-objective optimization problem for performance and energy (BOPPE) face a formidable challenge. They must take into account the new complexities inherent in these platforms such as severe resource contention and non-uniform memory access (NUMA). Novel model-based methods and data partitioning algorithms have been proposed that address the challenge. However, these methods take as input full functional performance and energy models (FPM and FEM), which have prohibitively high model construction costs. Therefore, they are not suitable for employment in self-adaptable applications. In this paper, we present a self-adaptable data partitioning algorithm called ADAPTALEPH, which solves BOPPE on homogeneous clusters of multicore CPUs. Unlike the state-of-the-art solving BOPPE that take as inputs full FPM and FEM, it constructs partial FPM and FEM during its execution using all the available processors. It returns a locally Pareto-optimal set of solutions, which are the heterogeneous workload distributions that achieve inter-node optimization of data-parallel applications for performance and energy. We experimentally study the efficiency of ADAPTALEPH for three data-parallel applications, ie, matrix-vector multiplication, matrix-matrix multiplication, and fast Fourier transform, on a modern multicore CPU and simulations for homogeneous clusters of such CPUs. We demonstrate that the locally Pareto-optimal front approaches the globally Pareto-optimal front as the number of points in the partial discrete FPM and FEM functions are increased. The number of points in the partial FPM/FEM when the locally Pareto-optimal front becomes the globally Pareto-optimal front is considerably less than the number of points in the full FPM/FEM thereby suggesting development of methods that can leverage this finding to drastically reduce the model construction times.

show abstract

Efficient exact algorithms for continuous bi‐objective performance‐energy optimization of applications with linear energy and monotonically increasing performance profiles on heterogeneous high performance computing platforms

Khaleghzadeh

Manumachu

Lastovetsky

2022

Concurrency and Computation

View full text Add to dashboard Cite

Performance and energy are the two most important objectives for optimization on heterogeneous high performance computing platforms. This work studies a mathematical problem motivated by the bi-objective optimization of data-parallel applications on such platforms for performance and energy. First, we formulate the problem and present an exact algorithm of polynomial complexity solving the problem where all the application profiles of objective type one are continuous and strictly increasing, and all the application profiles of objective type two are linear increasing. We then apply the algorithm to develop solutions for two related optimization problems of parallel applications on heterogeneous hybrid platforms, one for performance and dynamic energy and the other for performance and total energy. Our proposed solution methods are then employed to solve the two bi-objective optimization problems for two data-parallel applications, matrix multiplication and gene sequencing, on a hybrid platform employing five heterogeneous processors, namely, two different Intel multicore CPUs, an Nvidia K40c GPU, an Nvidia P100 PCIe GPU, and an Intel Xeon Phi.

show abstract

Data partitioning with a realistic performance model of networks of heterogeneous computers

Cited by 33 publications

References 3 publications

Data Partitioning with a Realistic Performance Model of Networks of Heterogeneous Computers with Task Size Limits

Data Partitioning with a Realistic Performance Model of Networks of Heterogeneous Computers with Task Size Limits

Design of self‐adaptable data parallel applications on multicore clusters automatically optimized for performance and energy through load distribution

Efficient exact algorithms for continuous bi‐objective performance‐energy optimization of applications with linear energy and monotonically increasing performance profiles on heterogeneous high performance computing platforms

Contact Info

Product

Resources

About