Traditional image feature matching methods cannot obtain satisfactory results for multi-modal images in most cases because different imaging mechanisms bring significant nonlinear radiation distortion differences and geometric distortion. The key to multi-modal image matching is trying to eliminate the nonlinear radiation distortion and extract more robust features. This paper proposes a new robust feature matching method for multi-modal images. Our method starts by detecting feature points on phase congruency maps in nonlinear scale space and then removing mismatches by progressive filtering. Specifically, the phase congruency maps are generated by the Log-Gabor filter. Then, the feature points on phase congruency maps are detected in nonlinear scale space constructed by the nonlinear diffusion filter. Subsequently, the structure descriptor is established by the Log-Gabor filter, and the initial correspondences are constructed by bilateral matching. Finally, an iterative strategy is used to remove mismatches by progressive filtering. We perform comparison experiments on our proposed method with the SIFT, RIFT, VFC, LLT, LPM, and mTopKPR methods using multi-modal images. The algorithms of each method are comprehensively evaluated both qualitatively and quantitatively. Our experimental results indicate the superiority of our method over the other six matching methods.