Face manipulation methods develop rapidly in recent years, whose potential risk to society accounts for the emerging of researches on detection methods. However, due to the diversity of manipulation methods and the high quality of fake images, detection methods suffer from a lack of generalization ability. To solve the problem, we find that segmenting images into semantic fragments could be effective, as discriminative defects and distortions are closely related to such fragments. Besides, to highlight discriminative regions in fragments and to measure contribution to the final prediction of each fragment is efficient for the improvement of generalization ability. Therefore, we propose a novel manipulated face detection method based on Multilevel Facial Semantic Segmentation and Cascade Attention Mechanism. To evaluate our method, we reconstruct two datasets: GGFI and FFMI, and also collect two open-source datasets. Experiments on four datasets verify the advantages of our approach against other state-of-the-arts, especially its generalization ability.