Food non-destructive detection technology (NDDT) is a powerful impetus to the development of food safety and quality. One of the essential tasks of food quality regulation is the non-destructive detection of the food’s nutrient content. However, existing food nutrient NDDT performs poorly in terms of efficiency and accuracy, which hinders their widespread application in daily meals. Therefore, this paper proposed an end-to-end food nutrition non-destructive detection method, named Swin-Nutrition, which combined deep learning and NDDT to evaluate the nutrient content of food. The method aimed to fully capture the feature information from the food images and thus accurately estimate the nutrient content. Swin-Nutrition resorted to Swin Transformer, the feature fusion module (FFM), and the nutrient prediction module to evaluate nutrient content. In particular, Swin Transformer acted as the backbone network for feature extraction of food images, and FFM was used to obtain the discriminative feature representation to improve the accuracy of prediction. The experimental results on the Nutrition5k dataset demonstrated the effectiveness and efficiency of our proposed method. Specifically, the mean value of the percentage mean absolute error (PMAE) for calories, mass, fat, carbohydrate, and protein were only 15.3%, 12.5%, 22.1%, 20.8%, and 15.4%, respectively. We hope that our simple and effective method will provide a solid foundation for the research of food NDDT.