Background/aim
The complicated nature of tumor formation makes it difficult to identify discriminatory genes. Recently, transcriptome-based supervised classification methods using support vector machines (SVMs) have become popular in this field. However, the inclusion of less significant variables in the construction of classification models can lead to misclassification. To improve model performance, feature selection methods such as enrichment analysis can be used to extract useful variable sets. The detection of genes that can discriminate between normal and tumor samples in the association of cancer and disease remains an area of limited information. We therefore aimed to discover novel and practical sets of discriminatory biomarkers by utilizing the association of cancer and disease.
Materials and methods
In this study, we employed an SVM classification method for differentially expressed genes enriched by Disease Ontology and filtered nondiscriminatory features using Wilk’s lambda criterion prior to classification. Our approach uses the discovery of disease-associated genes as a viable strategy to identify gene sets that discriminate between tumor and normal states. We analyzed the performance of our algorithm using comprehensive RNA-Seq data for adenocarcinoma of the colon, squamous cell carcinoma of the lung, and adenocarcinoma of the lung. The classification performance of the obtained gene sets was analyzed by comparison with different expression datasets and previous studies using the same datasets.
Results
It was found that our algorithm extracts stable small gene sets that provide high accuracy in predicting cancer status. In addition, the gene sets generated by our method perform well in survival analyses, indicating their potential for prognosis.
Conclusion
By combining gene sets for both diagnosis and prognosis, our method can improve clinical applications in cancer research. Our algorithm is available as an R package with a graphical user interface in Bioconductor (
https://doi.org/10.18129/B9.bioc.SVMDO
) and GitHub (
https://github.com/robogeno/SVMDO
).