With the intensification of global climate change and the frequent occurrence of forest fires, the development of efficient and precise forest fire monitoring and image segmentation technologies has become increasingly important. In dealing with challenges such as the irregular shapes, sizes, and blurred boundaries of flames and smoke, traditional convolutional neural networks (CNNs) face limitations in forest fire image segmentation, including flame edge recognition, class imbalance issues, and adapting to complex scenarios. This study aims to enhance the accuracy and efficiency of flame recognition in forest fire images by introducing a backbone network based on the Swin Transformer and combined with an adaptive multi-scale attention mechanism and focal loss function. By utilizing a rich and diverse pre-training dataset, our model can more effectively capture and understand key features of forest fire images. Through experimentation, our model achieved an intersection over union (IoU) of 86.73% and a precision of 91.23%. This indicates that the performance of our proposed wildfire segmentation model has been effectively enhanced. A series of ablation experiments validate the importance of these technological improvements in enhancing model performance. The results show that our approach achieves significant performance improvements in forest fire image segmentation tasks compared to traditional models. The Swin Transformer provides more refined feature extraction capabilities, the adaptive multi-scale attention mechanism helps the model focus better on key areas, and the focal loss function effectively addresses the issue of class imbalance. These innovations make the model more precise and robust in handling forest fire image segmentation tasks, providing strong technical support for future forest fire monitoring and prevention.