SUMMARYPresent day engineering optimization problems often impose large computational demands, resulting in long solution times even on a modern high-end processor. To obtain enhanced computational throughput and global search capability, we detail the coarse-grained parallelization of an increasingly popular global search method, the particle swarm optimization (PSO) algorithm. Parallel PSO performance was evaluated using two categories of optimization problems possessing multiple local minima-large-scale analytical test problems with computationally cheap function evaluations and medium-scale biomechanical system identification problems with computationally expensive function evaluations. For load-balanced analytical test problems formulated using 128 design variables, speedup was close to ideal and parallel efficiency above 95% for up to 32 nodes on a Beowulf cluster. In contrast, for load-imbalanced biomechanical system identification problems with 12 design variables, speedup plateaued and parallel efficiency decreased almost linearly with increasing number of nodes. The primary factor affecting parallel performance was the synchronization requirement of the parallel algorithm, which dictated that each iteration must wait for completion of the slowest fitness evaluation. When the analytical problems were solved using a fixed number of swarm iterations, a single population of 128 particles produced a better convergence rate than did multiple independent runs performed using sub-populations (8 runs with 16 particles, 4 runs with 32 particles, or 2 runs with 64 particles). These results suggest that (1) parallel PSO exhibits excellent parallel performance under load-balanced conditions, (2) an asynchronous implementation would be valuable for real-life problems subject to load imbalance, and (3) larger population sizes should be considered when multiple processors are available.
The high computational cost of complex engineering optimization problems has motivated the development of parallel optimization algorithms. A recent example is the parallel particle swarm optimization (PSO) algorithm, which is valuable due to its global search capabilities. Unfortunately, because existing parallel implementations are synchronous (PSPSO), they do not make efficient use of computational resources when a load imbalance exists. In this study, we introduce a parallel asynchronous PSO (PAPSO) algorithm to enhance computational efficiency. The performance of the PAPSO algorithm was compared to that of a PSPSO algorithm in homogeneous and heterogeneous computing environments for small- to medium-scale analytical test problems and a medium-scale biomechanical test problem. For all problems, the robustness and convergence rate of PAPSO were comparable to those of PSPSO. However, the parallel performance of PAPSO was significantly better than that of PSPSO for heterogeneous computing environments or heterogeneous computational tasks. For example, PAPSO was 3.5 times faster than was PSPSO for the biomechanical test problem executed on a heterogeneous cluster with 20 processors. Overall, PAPSO exhibits excellent parallel performance when a large number of processors (more than about 15) is utilized and either (1) heterogeneity exists in the computational task or environment, or (2) the computation-to-communication time ratio is relatively small.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.