Abstract-Sparse linear algebra is a key component of many scientific computations such as computational fluid dynamics, mechanical engineering or the design of new materials to mention only a few. The discretization of complex geometries in unstructured meshes leads to sparse matrices with irregular patterns. Their distribution in turn results in irregular communication patterns within parallel operations.In this paper, we show how sparse linear algebra can be implemented effortless on distributed memory architectures. We demonstrate how simple it is to incorporate advanced partitioning, network topology mapping, and data migration techniques into parallel HPC programs by establishing novel abstractions.For this purpose, we developed a linear algebra libraryParallel Matrix Template Library 4 -based on generic and meta-programming introducing a new paradigm: meta-tuning. The library establishes its own domain-specific language embedded in C ++ . The simplicity of software development is not paid by lower performance. Moreover, the incorporation of topology mapping demonstrated performance improvements up to 29 %.
I. MOTIVATIONMany scientific simulations, such as computational fluid dynamics, mechanical engineering or the design of new materials use computations on unstructured grids as their core method ( §II-A). The operations are expressed as linear algebra (LA) with sparse matrices. These matrices are very often unstructured, that is, the distribution of non-zero values and the data dependencies of typical operations, such as matrixvector multiplication, are irregular.Many large-scale scientific HPC applications can highly benefit from specialized data structures and domain-specific algorithms operating on them. On the other hand, strongly specialized implementations are very expensive to expand for new algorithms and new data structures.The introduction of PETSc [1] in the 90s provided reusable algorithms and data structures for many applications leading to a significant increase of productivity in scientific software development. We aim to raise the productivity further with techniques that did not exist yet at the time PETSc was created.The goal is that the linear algebra library adapts itself to the scientific application instead of applications designed around libraries. Such adaption can be achieved thanks to the expressiveness and efficiency of the template system of C ++ [2] [4]. In this work, we focus on the last two, distributing the unstructured matrices and mapping the resulting communication graph to the network topology. Ideally, these tasks are performed without user assistance leading to convenient libraries that allow developers to program with intuitive abstractions but without sacrificing performance ( §II-B-II-F).Domain-decomposition techniques for structured and unstructured grids have been intensively analyzed and libraries that provide good decompositions are ready for use. In contrast to it, mapping those unstructured grids and their according irregular communication topologies onto sta...