Common Midpoint (CMP) and Common Re ection Surface (CRS) are widely used methods for improving the signal-to-noise ratio in the eld of seismic processing. ese methods are computationally intensive and require high performance computing. is paper optimizes these methods on the Sunway many-core architecture and implements large-scale seismic processing on the Sunway Taihulight supercomputer. We propose the following three optimization techniques: 1) we propose a so ware cache method to reduce the overhead of memory accesses, and share data among CPEs via the register communication; 2) we re-design the semblance calculation procedure to further reduce the overhead of memory accesses; 3) we propose a vectorization method to improve the performance when processing the small volume of data within short loops. e experimental results show that our implementations of CMP and CRS methods on Sunway achieve 3.50× and 3.01× speedup on average compared to the-state-of-the-art implementations on CPU. In addition, our implementation is capable to run on more than one million cores of Sunway TaihuLight with good scalability.