In recent years, the estimation of tobacco field areas has become a critical component of precision tobacco cultivation. However, traditional satellite remote sensing methods face challenges such as high costs, low accuracy, and susceptibility to noise, making it difficult to meet the demand for high precision. Additionally, optical remote sensing methods perform poorly in regions with complex terrain. Therefore, Unmanned Aerial Vehicle multispectral remote sensing technology has emerged as a viable solution due to its high resolution and rich spectral information. This study employed a DJI Mavic 3M equipped with high-resolution RGB and multispectral cameras to collect tobacco field data covering five bands: RGB, RED, RED EDGE, NIR, and GREEN in Agang Town, Luoping County, Yunnan Province, China. To ensure the accuracy of the experiment, we used 337, 242, and 215 segmented tobacco field images for model training, targeting both RGB channels and seven-channel data. We developed a tobacco field semantic segmentation method based on PP-LiteSeg and deeply customized the model to adapt to the characteristics of multispectral images. The input layer’s channel number was adjusted to multiple channels to fully utilize the information from the multispectral images. The model structure included an encoder, decoder, and SPPM module, which used a multi-layer convolution structure to achieve feature extraction and segmentation of multispectral images. The results indicated that compared to traditional RGB images, multispectral images offered significant advantages in handling edges and complex terrain for semantic segmentation. Specifically, the predicted area using the seven-channel data was 11.43 m² larger than that obtained with RGB channels. Additionally, the seven-channel model achieved a prediction accuracy of 98.84%. This study provides an efficient and feasible solution for estimating tobacco field areas based on multispectral images, offering robust support for modern agricultural management.