The unavailability of large amounts of well-labeled data poses a significant challenge in many medical imaging tasks. Even in the likelihood of having access to sufficient data, the process of accurately labeling the data is an arduous and time-consuming one, requiring expertise skills. Again, the issue of unbalanced data further compounds the abovementioned problems and presents a considerable challenge for many machine learning algorithms. In lieu of this, the ability to develop algorithms that can exploit large amounts of unlabeled data together with a small amount of labeled data, while demonstrating robustness to data imbalance, can offer promising prospects in building highly efficient classifiers. This work proposes a semisupervised learning method that integrates self-training and self-paced learning to generate and select pseudolabeled samples for classifying breast cancer histopathological images. A novel pseudolabel generation and selection algorithm is introduced in the learning scheme to generate and select highly confident pseudolabeled samples from both well-represented classes to less-represented classes. Such a learning approach improves the performance by jointly learning a model and optimizing the generation of pseudolabels on unlabeled-target data to augment the training data and retraining the model with the generated labels. A class balancing framework that normalizes the class-wise confidence scores is also proposed to prevent the model from ignoring samples from less represented classes (hard-to-learn samples), hence effectively handling the issue of data imbalance. Extensive experimental evaluation of the proposed method on the BreakHis dataset demonstrates the effectiveness of the proposed method.
This study proposes a self-paced learning scheme that integrates self-training and deep learning to select and learn labeled and unlabeled data samples for classifying anteriorposterior chest images as either being pneumonia-infected or normal. With this new approach, a model is first trained with labeled data. The model is evaluated on unlabeled data to generate pseudo labels for the unlabeled data. Using a novel selection scheme, the pseudo-labeled samples are then selected to update the model in next training iteration of the semisupervised training process. The selected pseudo-labeled images to be added to the next training iteration are images with the most confident probabilities from every unlabeled class. Such a selection scheme prevents mistake reinforcement, which is a prevalent occurrence in self-training. With deep models having the tendency to latch onto well-represented class samples while ignoring less transferable and represented classes, especially in the case of unbalanced data, the proposed method utilizes a novel algorithm for the generation and selection of reliable top-K pseudo-labeled samples to be used in updating the model during the next training phase. Such an approach does not only force the model to learn the hard samples in the training data, it also helps enlarge the training set by generating enough samples that satisfy the hunger of deep models. Extensive experimental evaluation of the proposed method yields higher accuracy results compared to methods mentioned in the literature on the same dataset, an indication of the effectiveness of the proposed method.
Conventional approaches to breast cancer diagnosis are associated with drawbacks that ultimately affect the quality of diagnosis and subsequent treatment, pushing for the need for automatic and precise classification of breast cancer tumors. The advent of deep learning methods has witnessed an increasing interest in their applications in many tasks. The specific case of using convolutional neural networks with transfer learning has witnessed tremendous successes in many classification tasks. Nonetheless, with transfer learning, the sheer number of parameters associated with deep networks coupled with the distance disparity between source data and target data leave networks prone to overfitting, particularly in the case of limited data. Also, negative transfer may occur in the situation where the source and target domains are not related. This work proposes a simple convolutional neural network model trained from scratch for discriminating benign and malignant breast cancer tumors in histopathological images. Four deep learning optimization algorithms are leveraged and explored to ascertain how optimizers aid in finding good sets of parameters that help minimize loss and increase overall classification accuracy. By adopting a polynomial learning rate decay scheduling and implementing several data augmentation techniques that regulate overfitting and improve the generalization ability of the proposed model, accuracy, sensitivity, specificity, and Area Under the Curve values of 89.92%, 94.02%, 86.42%, and 0.884 (88.4%), respectively are reported.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.