Social media has gradually become the primary news source for people in recent years, providing convenience but also leading to the spread of false information. With the rise of media-rich social media platforms, fake news has evolved from single-text to multimodal formats, prompting increased attention to multi-modal fake news detection. However, most existing methods rely on representation-level features that are closely tied to the dataset, resulting in insufficient modelling of semantic-level features and a limited ability to generalize to new data. To address this issue, we propose a semantically enhanced multimodal fake news detection method that utilizes pre-trained language models to capture implicit factual knowledge and explicitly extracts visual entities to better understand the deep semantics of multimodal news. We also extract visible features at different semantic levels, use a text-guided attention mechanism to model semantic interactions between text and images, and integrate multimodal features. Experimental results on real datasets based on Weibo news demonstrate that the proposed method outperforms other methods with an accuracy of 0.895 in multimodal fake news detection.