“…In response to the design of general‐purpose processors (or CPUs) with a moderate number of cores, a series of efforts have demonstrated the benefits of extracting task parallelism for dense linear algebra operations: PLASMA, 1,2 libFLAME, 3,4 StarPU, 5,6 and OmpSs 7,8 . Following this trend, processor architectures for high performance computing (HPC) have evolved over the past few years to integrate a very large number of cores so that, nowadays, CPUs (e.g., from Intel, AMD and ARM) with 16–64 are not uncommon in HPC servers.…”