Abstract-Compared with MPI, OpenMP provides us an easy way to parallelize the multilevel fast multipole algorithm (MLFMA) on shared-memory systems. However, the implementation of OpenMP parallelization has many pitfalls because different parts of MLFMA have distinct numerical characteristics due to its complicated algorithm structure. These pitfalls often cause very low efficiency, especially when many threads are employed. Through an in-depth investigation on these pitfalls with analysis and numerical experiments, we propose an efficient OpenMP parallel MLFMA. Two strategies are proposed in the parallelization, including: 1) loop reorganization for far-field interaction in the MLFMA; 2) determination of a transition level. Numerical experiments on large scale targets show the proposed OpenMP parallel scheme can perform as efficiently as the MPI counterpart, and much more efficiently than the straightforward OpenMP parallel one.