Aerial image simulation is a fundamental problem in advanced lithography for chip fabrication. Since it requires a huge number of mathematical computations, an efficient yet accurate implementation becomes a necessity. In the literature, GPU or FPGA has demonstrated its potential for accelerating aerial image simulation. However, the comparisons of GPU or FPGA to CPU were not done thoroughly. In particular, careful tunings for the CPU-based method were missing in the previous works, while the recent CPU architectures have significant modifications toward high performance computing capabilities. In this paper, we present and discuss several algorithms for the aerial image simulation on multi-core SIMD CPU. Our fastest method achieves up to 73X speedup over the baseline serial approach and outperforms the state-of-the-art GPU-based approach by up to 2X speedup on a single hex-core SIMD CPU. We show that the performance on the multi-core SIMD CPU is promising, and that careful CPU tunings are necessary in order to exploit its computing capabilities.