Supervised classifiers require a lot of data with accurate labels to learn to recognize chest X-ray images (CXR). However, manually labeling an extensive collection of CXR images is time-consuming and costly. To address this issue, a method for the semi-supervised labelling of extensive collections of CXR images is proposed leveraging unsupervised clustering with minimum expert knowledge to generate ground truth images. The proposed methodology entails: using unsupervised clustering techniques such as K-Means and Self-Organizing Maps. Second, the images are fed to five different feature vectors to utilize the potential differences between features to their full advantage. Third, each data point gets the label of the clusterās center to which it belongs. Finally, a majority vote is used to decide the ground truth image. The number of clusters created by the method chosen strictly limits the amount of human involvement. To evaluate the effectiveness of the proposed method, experiments were conducted on two publicly available CXR datasets, namely VinDR-CXR and Montgomery datasets. The experiments showed that, for a KNN classifier, manually labeling only 1% (VinDr-CXR), or 10% (Montgomery) of the training data, gives a similar performance as labeling the whole dataset. The proposed methodology efficiently generates ground-truth images from publicly available CXR datasets. To our knowledge, this is the first study to use the VinDr-CXR and Montgomery datasets for ground truth image generation. Extensive experimental analysis using machine learning and statistical techniques shows that the proposed methodology efficiently generates ground truth images from CXR datasets.