We show how to multiply two n × n matrices S and T over semirings in the Congested Clique model, where n nodes communicate in a fully connected synchronous network using O(log n)-bit messages, within O(nz(S) 1/3 nz(T ) 1/3 /n+1) rounds of communication, where nz(S) and nz(T ) denote the number of non-zero elements in S and T , respectively. By leveraging the sparsity of the input matrices, our algorithm greatly reduces communication costs compared with general multiplication algorithms [Censor-Hillel et al., PODC 2015], and thus improves upon the state-of-the-art for matrices with o(n 2 ) non-zero elements. Moreover, our algorithm exhibits the additional strength of surpassing previous solutions also in the case where only one of the two matrices is such. Particularly, this allows to efficiently raise a sparse matrix to a power greater than 2. As applications, we show how to speed up the computation on non-dense graphs of 4-cycle counting and all-pairs-shortest-paths.Our algorithmic contribution is a new deterministic method of restructuring the input matrices in a sparsity-aware manner, which assigns each node with element-wise multiplication tasks that are not necessarily consecutive but guarantee a balanced element distribution, providing for communication-efficient multiplication.Moreover, this new deterministic method for restructuring matrices may be used to restructure the adjacency matrix of input graphs, enabling faster deterministic solutions for graph related problems. As an example, we present a new sparsity aware, deterministic algorithm which solves the triangle listing problem in O(m/n 5/3 + 1) rounds, a complexity that was previously obtained by a randomized algorithm [Pandurangan et al., SPAA 2018], and that matches the known lower bound ofΩ(n 1/3 ) when m = n 2 of [Izumi and Le Gall, PODC 2017, Pandurangan et al., SPAA 2018]. Naturally, our triangle listing algorithm also implies triangle counting within the same complexity of O(m/n 5/3 + 1) rounds, which is (possibly more than) a cubic improvement over the previously known deterministic O(m 2 /n 3 )-round algorithm [Dolev et al., DISC 2012]. * A preliminary version of this paper appeared in OPODIS 2018.An important case of Theorem 1, especially when squaring the adjacency matrix of a graph in order to solve graph problems, is when the sparsities of the input matrices are roughly the same. In such a case, Theorem 1 gives the following.Corollary 1. Given two n × n matrices S and T , where O(nz(S)) = O(nz(T )) = m, Algorithm SMM deterministically computes the product P = S · T over a semiring in the Congested Clique model, within O(m 2/3 /n + 1) rounds.Notice that for m = O(n 2 ), Corollary 1 gives the same complexity of O(n 1/3 ) rounds as given by the semiring multiplication of [9].We apply Algorithm SMM to 4-cycle counting, obtaining the following.Theorem 2. There is a deterministic algorithm that computes the number of 4-cycles in an n-node graph G in O(m 2/3 /n + 1) rounds in the Congested Clique model, where m is the number of edges of G.No...