Land cover (LC) and land use (LU) have commonly been classified separately from remotely sensed imagery, without considering the intrinsically hierarchical and nested relationships between them. In this paper, for the first time, a highly novel joint deep learning framework is proposed and demonstrated for LC and LU classification. The proposed Joint Deep Learning (JDL) model incorporates a multilayer perceptron (MLP) and convolutional neural network (CNN), and is implemented via a Markov process involving iterative updating. In the JDL, LU classification conducted by the CNN is made conditional upon the LC probabilities predicted by the MLP. In turn, those LU probabilities together with the original imagery are re-used as inputs to the MLP to strengthen the spatial and spectral feature representations. This process of updating the MLP and CNN forms a joint distribution, where both LC and LU are classified simultaneously through iteration. The proposed JDL method provides a general framework within which the pixel-based MLP and the patch-based CNN provide mutually complementary information to each other, such that both are refined in the classification process through iteration. Given the well-known complexities associated with the classification of very fine spatial resolution (VFSR) imagery, the effectiveness of the proposed JDL was tested on aerial photography of two large urban and suburban areas in Great Britain (Southampton and Manchester). The JDL consistently demonstrated greatly increased accuracies with increasing iteration, not only for the LU classification, but for both 2 the LC and LU classifications, achieving by far the greatest accuracies for each at around 10 iterations. The average overall classification accuracies were 90.18% for LC and 87.92% for LU for the two study sites, far higher than the initial accuracies and consistently outperforming benchmark comparators (three each for LC and LU classification). This research, thus, represents the first attempt to unify the remote sensing classification of LC (state; what is there?) and LU (function; what is going on there?), where previously each had been considered separately only. It, thus, has the potential to transform the way that LC and LU classification is undertaken in future. Moreover, it paves the way to address effectively the complex tasks of classifying LC and LU from VFSR remotely sensed imagery via joint reinforcement, and in an automatic manner.