Accelerating the bi-objective optimization of applications for performance and energy is crucial to achieving energy efficiency objectives and meeting quality-of-service requirements in modern high-performance computing platforms and cloud computing infrastructures. In this work, we highlight the crucial challenges to accelerate model-based methods proposed for the bi-objective optimization of data-parallel applications for performance and energy that employ workload distribution between the executing processors as the decision variable. The methods solve unconstrained bi-objective optimization problems and take input, the processors' performance and energy profiles in the form of discrete functions of workload size, and output Pareto-optimal solutions (workload distributions), minimizing the execution time and the total energy consumption of computations during the parallel execution of the application. One of the challenges is the fast computation of Pareto-optimal solutions. We then formulate the bi-objective optimization problem of data-parallel applications for performance and energy through workload distribution on a cluster of p identical hybrid nodes, each containing h heterogeneous processors. The state-of-the-art algorithm for solving the problem is sequential and takes exorbitant execution times to find Pareto-optimal solutions for even moderate numbers of processors. We propose two algorithms that address this shortcoming. The first algorithm is an exact sequential algorithm that is more efficient and amenable to parallelization and achieves a complexity reduction of O(m × h) over the state-of-the-art sequential algorithm where m is the cardinality of the input discrete execution time and dynamic energy functions. The second algorithm is a parallel algorithm executed by q identical parallel processes that reduces the complexity of our proposed sequential algorithm by O(q) and therefore achieves a complexity reduction of O(m × h × q) over the state-of-the-art sequential algorithm. Finally, we experimentally analyze the practical efficacy of our proposed algorithms for two data-parallel applications, matrix multiplication and fast Fourier transform, on a heterogeneous hybrid node containing an Intel Haswell multicore CPU, an Nvidia k40c GPU, and an Nvidia P100 GPU and simulations of clusters of such hybrid nodes. The experiments demonstrate that our proposed algorithms provide tremendous speedups over state-of-the-art solutions.INDEX TERMS high-performance heterogeneous computing, energy-efficient computing, biobjective optimization, performance optimization, energy optimization, data-parallel applications, workload distribution I. INTRODUCTION Performance and energy are the two most important objectives for optimization in modern high per-formance computing (HPC) platforms, computational grids, data centers, and cloud computing infrastructures ([1],[2],[3],[4]). Achieving the energy efficiency objectives