Today, different kinds of hardware for computing are more and more powerful, in accordance with large scaled complex computing tasks. From multi-core computer to clusters, various parallel architectures are developed for computing acceleration. In terms of the long time iteration and population based mechanism of intelligent optimization algorithm, parallelization is attainable and imperative in many complex optimization. Among the existing parallel methods developed for intelligent optimization algorithm, almost all of them are established upon population division with periodical communication. In several cases, the performances of different topologies and different communication mechanisms are varied. Thus in acceleration of intelligent optimization algorithm, the selection and design of topology and communication mechanism are two crucial parts and can also be configured flexibly.That is to say, the implementation of different topology and communication mechanism can be encapsulated into modules according to different hardware architectures. These modules are independent with the operators applied in different sub-populations, thus can be reused like operators.According to such idea, in this chapter, we firstly introduce the parallel implementation ways of intelligent optimization algorithm on different hardware architectures. Then we elaborate the typical parallel topologies based on general population division. After that, two configurable parallel ways are presented in different hardware both with module based configuration idea.