Medical image classification poses significant challenges in real-world scenarios. One major obstacle is the scarcity of labelled training data, which hampers the performance of image-classification algorithms and generalisation. Gathering sufficient labelled data is often difficult and time-consuming in the medical domain, but deep learning (DL) has shown remarkable performance, although it typically requires a large amount of labelled data to achieve optimal results. Transfer learning (TL) has played a pivotal role in reducing the time, cost, and need for a large number of labelled images. This paper presents a novel TL approach that aims to overcome the limitations and disadvantages of TL that are characteristic of an ImageNet dataset, which belongs to a different domain. Our proposed TL approach involves training DL models on numerous medical images that are similar to the target dataset. These models were then fine-tuned using a small set of annotated medical images to leverage the knowledge gained from the pre-training phase. We specifically focused on medical X-ray imaging scenarios that involve the humerus and wrist from the musculoskeletal radiographs (MURA) dataset. Both of these tasks face significant challenges regarding accurate classification. The models trained with the proposed TL were used to extract features and were subsequently fused to train several machine learning (ML) classifiers. We combined these diverse features to represent various relevant characteristics in a comprehensive way. Through extensive evaluation, our proposed TL and feature-fusion approach using ML classifiers achieved remarkable results. For the classification of the humerus, we achieved an accuracy of 87.85%, an F1-score of 87.63%, and a Cohen’s Kappa coefficient of 75.69%. For wrist classification, our approach achieved an accuracy of 85.58%, an F1-score of 82.70%, and a Cohen’s Kappa coefficient of 70.46%. The results demonstrated that the models trained using our proposed TL approach outperformed those trained with ImageNet TL. We employed visualisation techniques to further validate these findings, including a gradient-based class activation heat map (Grad-CAM) and locally interpretable model-independent explanations (LIME). These visualisation tools provided additional evidence to support the superior accuracy of models trained with our proposed TL approach compared to those trained with ImageNet TL. Furthermore, our proposed TL approach exhibited greater robustness in various experiments compared to ImageNet TL. Importantly, the proposed TL approach and the feature-fusion technique are not limited to specific tasks. They can be applied to various medical image applications, thus extending their utility and potential impact. To demonstrate the concept of reusability, a computed tomography (CT) case was adopted. The results obtained from the proposed method showed improvements.