Abstract:Two--level parallelization is introduced to solve a massive block--tridiagonal matrix system. One--level is used for distributing blocks whose size is as large as the number of block rows due to the spectral basis, and the other level is used for parallelizing in the block row dimension. The purpose of the added parallelization dimension is to retard the saturation of the scaling due to communication overhead and inefficiencies in the single--level parallelization only distributing blocks. As a technique for parallelizing the tridiagonal matrix, the combined method of "Partitioned Thomas method" and "Cyclic Odd--Even Reduction" is implemented in an MPI--Fortran90 based finite element--spectral code (TORIC) that calculates the propagation of electromagnetic waves in a tokamak. The two--level parallel solver using thousands of processors shows more than 5 times improved computation speed with the optimized processor grid compared to the single--level parallel solver under the same conditions. Three--dimensional RF field reconstructions in a tokamak are shown as examples of the physics simulations that have been enabled by this algorithmic advance.