Most parallel applications suffer from load imbalance, a crucial performance degradation factor. In particle simulations, this is mainly due to the migration of particles between processing elements, which eventually gather unevenly and create workload imbalance. Dynamic load balancing is used at various itera-