The lack of labeled data is one of the main obstacles to the application of machine learning algorithms in a variety of domains. Semi-supervised learning, where additional samples are automatically labeled, is a common and cost-effective approach to address this challenge. A popular semi-supervised labeling approach is co-training, where two views of the data -achieved by the training of two learning models on different feature subsets -iteratively provide each other with additional newly-labeled samples. Despite being effective in many cases, existing co-training algorithms often suffer from low labeling accuracy and a heuristic sample-selection strategy that hurt their performance. We propose Co-training using Meta-learning (CoMet), a novel approach that addresses many of the shortcomings of existing cotraining methods. Instead of employing a greedy labeling approach of individual samples, CoMet evaluates batches of samples and is thus able to select samples that complement each other. Additionally, our approach employs a meta-learning approach that enables it to leverage insights from previously-evaluated datasets and apply these insights to other datasets. Extensive evaluation on 35 datasets shows CoMet significantly outperforms other leading co-training approaches, particularly when the amount of available labeled data is very small. Moreover, our analysis shows that CoMet's labeling accuracy and consistency of performance are also superior to those of existing approaches. Co-training, Semi-supervised learning, meta-learning
INDEX TERMS
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.