being very popular. The shape of mesh elements significantly impacts the efficiency and accuracy of simulation codes. The problem of improving element quality in unstructured tetrahedral meshes and triangular surface meshes is classically framed as an optimization problem. In this framework, the positions of the mesh vertices are adjusted to optimize element quality. In this paper, we explore how modern computer hardware, meaning heterogeneous many-core systems, accelerates these optimization computations. The reward for optimizing meshes faster is significant in practical terms, with fewer simulations failing due to poor mesh quality and a faster engineering design cycle.The wide availability of massively multi-threaded graphics processing units (GPUs) and multi-core CPUs offers a new direction for the acceleration of mesh optimization algorithms. Our work shows how optimization is effectively accomplished on a per-vertex basis on heterogeneous systems, exposing fine-grained parallelism in the same manner as Freitag et al.[4] did for more traditional parallel architectures. Our algorithm specifically seeks to reduce the maximum and average inverse mean ratio, which detects irregular and inverted simplex elements [5,6,15,18]. The framework of the algorithm is very flexible and accommodates essentially any underlying quality metric and core numerical optimization method.The main contribution of this paper is an examination of the effectiveness of a hybrid parallel optimization scheme as opposed to GPU-only parallelism for mesh optimization. In addition, we compare the performance of three different core numerical optimization techniques on a Kepler-class GPU. We describe in detail the use of the derivative-free Nelder-Mead simplex method, which in a previous work was seen to converge faster than several peer optimization methods for this application [22]. In our new results, Nelder-Mead continues to demonstrate superiority Abstract We describe a parallel algorithmic framework for optimizing the shape of elements in a simplicial volume mesh. Using fine-grained parallelism and asymmetric multiprocessing on multi-core CPU and modern graphics processing unit hardware simultaneously, we achieve speedups of more than tenfold over current state-of-the-art serial methods. In addition, improved mesh quality is obtained by optimizing both the surface and the interior vertex positions in a single pass, using feature preservation to maintain fidelity to the original mesh geometry. The framework is flexible in terms of the core numerical optimization method employed, and we provide performance results for both gradient-based and derivative-free optimization methods.