Swarm Intelligence (SI) algorithms are frequently applied to tackle complex optimization problems. SI is especially used when good solutions are requested for NP hard problems within a reasonable response time. And when such problems possess a very high dimensionality, a dynamic nature, or present intrinsic complex intertwined independent variables, computational costs for SI algorithms may still be too high. Therefore, new approaches and hardware support are needed to speed up processing. Nowadays, with the popularization of GPU and multi-core processing, parallel versions of SI algorithms can provide the required performance on those though problems. This paper aims to describe the state of the art of such approaches, to summarize the key points addressed, and also to identify the research gaps that could be addressed better. The scope of this review considers recent papers mainly focusing on parallel implementations of the most frequently used SI algorithms. The use of nested parallelism is of particular interest, since one level of parallelism is often not sufficient to exploit the computational power of contemporary parallel hardware. The sources were main scientific databases and filtered accordingly to the set requirements of this literature review.