A hierarchical parallelisation of the multilevel fast multipole algorithm (MLFMA) for the efficient solution of large-scale problems in computational electromagnetics is presented. The tree structure of MLFMA is distributed among the processors by partitioning both the clusters and the samples of the fields appropriately for each level. The parallelisation efficiency is significantly improved compared to previous approaches, where only the clusters or only the fields are partitioned in a level.Introduction: Surface integral equations are commonly used to formulate electromagnetic scattering and radiation problems involving complicated three-dimensional objects with arbitrary shapes [1]. By discretising the integral-equation formulations, we obtain dense matrix equations. They can be solved iteratively by accelerating the matrixvector multiplications using the multilevel fast multipole algorithm (MLFMA) [2]. Using MLFMA, matrix-vector multiplications related to an N Â N dense matrix equation can be performed in O(Nlog N ) time using O(Nlog N ) memory. However, accurate solutions of many real-life problems require discretisations with millions of unknowns, which cannot be solved easily by the sequential implementations of MLFMA running on a single processor. To solve such large problems, it is helpful to increase computational resources by assembling parallel computing platforms and at the same time by parallelising MLFMA. In this way, it has become possible to solve problems with 20-30 million unknowns on relatively inexpensive computing platforms [3][4][5][6][7][8].On the other hand, parallelisation of MLFMA is not trivial owing to the complicated structure of this algorithm. Simple parallelisation strategies usually fail to provide efficient solutions because of the communication among the processors and the unavoidable duplication of some of the computations over multiple processors [9]. In this Letter, we present a hierarchical strategy for the efficient parallelisation of MLFMA. We compare our strategy with previous parallelisation schemes to demonstrate the improved efficiency, especially when the number of processors is large.