Semi-supervised learning is a machine learning approach that tackles the challenge of having a large set of unlabeled data and few labeled ones. In this paper we adopt a semi-supervised self-training method to increase the amount of training data, prevent overfitting and improve the performance of deep models by proposing a novel selection algorithm that prevents mistake reinforcement which is a common thing in conventional self-training models. The model leverages, unlabeled data and specifically, after each training, we first generate pseudo-labels on the unlabeled set to be added to the labeled training samples. Next, we select the top-k most-confident pseudo-labeled images from each unlabeled class with their pseudo-labels and update the training data, and retrain the network on the updated training data. The method improves the accuracy in two-fold; bridging the gap in the appearance of visual objects, and enlarging the training set to meet the demands of deep models. We demonstrated the effectiveness of the model by conducting experiments on four state-of-the-art fine-grained datasets, which include Stanford Dogs, Stanford Cars, 102-Oxford flowers, and CUB-200-2011. We further evaluated the model on some coarse-grain data. Experimental results clearly show that our proposed framework has better performance than some previous works on the same data; the model obtained higher classification accuracy than most of the supervised learning models.
The unavailability of large amounts of well-labeled data poses a significant challenge in many medical imaging tasks. Even in the likelihood of having access to sufficient data, the process of accurately labeling the data is an arduous and time-consuming one, requiring expertise skills. Again, the issue of unbalanced data further compounds the abovementioned problems and presents a considerable challenge for many machine learning algorithms. In lieu of this, the ability to develop algorithms that can exploit large amounts of unlabeled data together with a small amount of labeled data, while demonstrating robustness to data imbalance, can offer promising prospects in building highly efficient classifiers. This work proposes a semisupervised learning method that integrates self-training and self-paced learning to generate and select pseudolabeled samples for classifying breast cancer histopathological images. A novel pseudolabel generation and selection algorithm is introduced in the learning scheme to generate and select highly confident pseudolabeled samples from both well-represented classes to less-represented classes. Such a learning approach improves the performance by jointly learning a model and optimizing the generation of pseudolabels on unlabeled-target data to augment the training data and retraining the model with the generated labels. A class balancing framework that normalizes the class-wise confidence scores is also proposed to prevent the model from ignoring samples from less represented classes (hard-to-learn samples), hence effectively handling the issue of data imbalance. Extensive experimental evaluation of the proposed method on the BreakHis dataset demonstrates the effectiveness of the proposed method.
Traffic sign recognition is a classification problem that poses challenges for computer vision and machine learning algorithms. Although both computer vision and machine learning techniques have constantly been improved to solve this problem, the sudden rise in the number of unlabeled traffic signs has become even more challenging. Large data collation and labeling are tedious and expensive tasks that demand much time, expert knowledge, and fiscal resources to satisfy the hunger of deep neural networks. Aside from that, the problem of having unbalanced data also poses a greater challenge to computer vision and machine learning algorithms to achieve better performance. These problems raise the need to develop algorithms that can fully exploit a large amount of unlabeled data, use a small amount of labeled samples, and be robust to data imbalance to build an efficient and high-quality classifier. In this work, we propose a novel semi-supervised classification technique that is robust to small and unbalanced data. The framework integrates weakly-supervised learning and self-training with self-paced learning to generate attention maps to augment the training set and utilizes a novel pseudo-label generation and selection algorithm to generate and select pseudo-labeled samples. The method improves the performance by: (1) normalizing the class-wise confidence levels to prevent the model from ignoring hard-to-learn samples, thereby solving the imbalanced data problem; (2) jointly learning a model and optimizing pseudo-labels generated on unlabeled data; and (3) enlarging the training set to satisfy the hunger of deep learning models. Extensive evaluations on two public traffic sign recognition datasets demonstrate the effectiveness of the proposed technique and provide a potential solution for practical applications.
This study proposes a self-paced learning scheme that integrates self-training and deep learning to select and learn labeled and unlabeled data samples for classifying anteriorposterior chest images as either being pneumonia-infected or normal. With this new approach, a model is first trained with labeled data. The model is evaluated on unlabeled data to generate pseudo labels for the unlabeled data. Using a novel selection scheme, the pseudo-labeled samples are then selected to update the model in next training iteration of the semisupervised training process. The selected pseudo-labeled images to be added to the next training iteration are images with the most confident probabilities from every unlabeled class. Such a selection scheme prevents mistake reinforcement, which is a prevalent occurrence in self-training. With deep models having the tendency to latch onto well-represented class samples while ignoring less transferable and represented classes, especially in the case of unbalanced data, the proposed method utilizes a novel algorithm for the generation and selection of reliable top-K pseudo-labeled samples to be used in updating the model during the next training phase. Such an approach does not only force the model to learn the hard samples in the training data, it also helps enlarge the training set by generating enough samples that satisfy the hunger of deep models. Extensive experimental evaluation of the proposed method yields higher accuracy results compared to methods mentioned in the literature on the same dataset, an indication of the effectiveness of the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.