Self-supervised contrastive learning offers a means of learning informative features from a pool of unlabeled data. In this paper, we investigate another useful approach. We propose an entirely unlabeled coreset selection method. In this regard, contrastive learning, one of several self-supervised methods, was recently proposed and has consistently delivered the highest performance. This prompted us to choose two leading methods for contrastive learning: the simple framework for contrastive learning of visual representations (SimCLR) and the momentum contrastive (MoCo) learning framework. We calculated the cosine similarities for each example of an epoch for the entire duration of the contrastive learning process and subsequently accumulated the cosine similarity values to obtain the coreset score. Our assumption was that a sample with low similarity would likely behave as a coreset. Compared with existing coreset selection methods with labels, our approach reduced the cost associated with human annotation. In this study, the unsupervised method implemented for coreset selection achieved improvements of 1.25% (for CIFAR10), 0.82% (for SVHN), and 0.19% (for QMNIST) over a randomly selected subset with a size of 30%. Furthermore, our results are comparable to those of the existing supervised coreset selection methods. The differences between the proposed and the above mentioned supervised coreset selection method (forgetting events) were 0.81% on the CIFAR10 dataset, −2.08% on the SVHN dataset (the proposed method outperformed the existing method), and 0.01% on the QMNIST dataset at a subset size of 30%. In addition, our proposed approach exhibited robustness even if the coreset selection model and target model were not identical (e.g., using ResNet18 as a selection model and ResNet101 as the target model). Lastly, we obtained more concrete proof that our coreset examples are highly informative by showing the performance gap between the coreset and non-coreset samples in the coreset cross test experiment. We observed a pair of performance ( ( testing : non-coreset, training : coreset ), ( testing : coreset, training : non-coreset ) ), i.e. (94.27%, 67.39 %) for CIFAR10, (98.24%, 83.30%) for SVHN, and (99.89%, 93.07%) for QMNIST with a subset size of 30%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.