Wheat is a major grain crop in China, accounting for one-fifth of the national grain production. Drought stress severely affects the normal growth and development of wheat, leading to total crop failure, reduced yields, and quality. To address the lag and limitations inherent in traditional drought monitoring methods, this paper proposes a multimodal deep learning-based drought stress monitoring S-DNet model for winter wheat during its critical growth periods. Drought stress images of winter wheat during the Rise-Jointing, Heading-Flowering and Flowering-Maturity stages were acquired to establish a dataset corresponding to soil moisture monitoring data. The DenseNet-121 model was selected as the base network to extract drought features. Combining the drought phenotypic characteristics of wheat in the field with meteorological factors and IoT technology, the study integrated the meteorological drought index SPEI, based on WSN sensors, and deep image learning data to build a multimodal deep learning-based S-DNet model for monitoring drought stress in winter wheat. The results show that, compared to the single-modal DenseNet-121 model, the multimodal S-DNet model has higher robustness and generalization capability, with an average drought recognition accuracy reaching 96.4%. This effectively achieves non-destructive, accurate, and rapid monitoring of drought stress in winter wheat.