Based on the current research on the wine grape variety recognition task, it has been found that traditional deep learning models relying only on a single feature (e.g., fruit or leaf) for classification can face great challenges, especially when there is a high degree of similarity between varieties. In order to effectively distinguish these similar varieties, this study proposes a multisource information fusion method, which is centered on the SynthDiscrim algorithm, aiming to achieve a more comprehensive and accurate wine grape variety recognition. First, this study optimizes and improves the YOLOV7 model and proposes a novel target detection and recognition model called WineYOLO-RAFusion, which significantly improves the fruit localization precision and recognition compared with YOLOV5, YOLOX, and YOLOV7, which are traditional deep learning models. Secondly, building upon the WineYOLO-RAFusion model, this study incorporated the method of multisource information fusion into the model, ultimately forming the MultiFuseYOLO model. Experiments demonstrated that MultiFuseYOLO significantly outperformed other commonly used models in terms of precision, recall, and F1 score, reaching 0.854, 0.815, and 0.833, respectively. Moreover, the method improved the precision of the hard to distinguish Chardonnay and Sauvignon Blanc varieties, which increased the precision from 0.512 to 0.813 for Chardonnay and from 0.533 to 0.775 for Sauvignon Blanc. In conclusion, the MultiFuseYOLO model offers a reliable and comprehensive solution to the task of wine grape variety identification, especially in terms of distinguishing visually similar varieties and realizing high-precision identifications.