There has been a growing interest in food image recognition for a wide range of applications. Among existing methods, mid-level image part-based approaches show promising performances due to their suitability for modelling deformable food parts (FPs). However, the achievable accuracy is limited by the FP representations based on low-level features. Benefiting from the capacity to learn powerful features with labelled data, deep learning approaches achieved state-of-the-art performances in several food image recognition problems. Both mid-level-based approaches and deep convolutional neural networks (DCNNs) approaches clearly have their respective advantages, but perhaps most importantly these two approaches can be considered complementary. As such, the authors propose a novel framework to better utilise DCNN features for food images by jointly exploring the advantages of both the mid-level-based approaches and the DCNN approaches. Furthermore, they tackle the challenge of training a DCNN model with the unlabelled mid-level parts data. They accomplish this by designing a clustering-based FP label mining scheme to generate part-level labels from unlabelled data. They test on three benchmark food image datasets, and the numerical results demonstrate that the proposed approach achieves competitive performance when compared with existing food image recognition approaches.