A task-based formulation of Scalable Universal Matrix Multiplication Algorithm (SUMMA), a popular algorithm for matrix multiplication (MM), is applied to the multiplication of hierarchy-free, rank-structured matrices that appear in the domain of quantum chemistry (QC). The novel features of our formulation are: (1) concurrent scheduling of multiple SUMMA iterations, and (2) fine-grained task-based composition. These features make it tolerant of the load imbalance due to the irregular matrix structure and eliminate all artifactual sources of global synchronization. Scalability of iterative computation of square-root inverse of block-ranksparse QC matrices is demonstrated; for full-rank (dense) matrices the performance of our SUMMA formulation usually exceeds that of the state-of-the-art dense MM implementations (ScaLAPACK and Cyclops Tensor Framework).1 Related matrix data structures have appeared under many names (matrices with decay, H-matrices, rank-structured matrices, and mosaic skeleton approximation), but no single globally-accepted terminology exists. For the history of these types of matrices see Ref [37]. arXiv:1509.00309v2 [cs.DC] 9 Oct 2015 nication costs can be partially or fully hidden by overlapping computation and communication, (b) performance should be less sensitive to topology, latency, and CPU clock variations, (c) fine-grained, task-based parallelism is a proven means to attain high intra-node performance by leveraging massively multicore platforms and hiding the costs of memory hierarchy (e.g. Intel TBB, Cilk), (d) lack of global synchronization allows the overlap multiple, high-level stages of computation (e.g. two or more multiple matrix multiplications contributing to the same expression).The new formulation was used to implement iterative computation of the square root inverse of a matrix, a prototypical operation in which block ranks of intermediate matrices change dynamically during the iteration. The usual advantage of the task formulation, tolerance of load imbalance and latency, are demonstrated in the regime where matrices approach full rank, by comparison against the state-of-the-art dense MM implementations.