Background: Optimal treatment for cancer patients relies on accurate pathological diagnosis. The world health organization has defined almost 100 known central nervous system (CNS) tumors, making the histopathological diagnostic process particularly challenging. To help reduce variability, validation costs, and standardize the histopathological diagnostic process, supervised machine learning techniques using epigenetic data have been developed. These methods require large labeled training data sets to obtain clinically acceptable classification accuracy. However, labeling pathology data for machine learning models is time-consuming and resource-intensive, especially for rare tumor types. On the other hand, there is now abundance of unlabeled epigenetic data across multiple databases. To maximize the utility of both labeled and unlabeled data, semi-supervised learning (SSL) approach has been proposed and shown to be effective in genomic field. However, it has not yet been explored with epigenetic data nor demonstrated beneficial to CNS tumor classification.Results: This paper explores the application of semi-supervised machine learning on methylation data to improve accuracy of supervised learning models in classifying CNS tumors. We comprehensively evaluated 11 SSL methods and developed a novel combination approach including a self-training with editing using support vector machine (SETRED-SVM) as the base learner model with an L2-penalized, multinomial logistic regression model to obtain high confidence labels from a small labeled instances. Results across 8 differently trained random forest and neural net models show that SETRED-SVM followed with multinomial logistic regression approach is able to produce high confidence training data set, leading to a statiscally significant increase in prediction accuracy of 82 CNS tumors.Conclusions: The proposed combination of semi-supervised technique and multinomial logistic regression holds the potential to leverage the abundant publically available unlabeled methylation data effectively. Such an approach is highly beneficial in providing additional training examples, especially for scarce tumor types, to boost the prediction accuracy of supervised models.