Background
Early-stage non-small cell lung carcinoma (NSCLC) accounts for more than 80% of lung cancer, which is a kind of cancer with high heterogeneity, so the genetic heterogeneity and molecular subtype should be explored.
Methods
Partitioning Around Medoid algorithm was used to acquire the molecular subtype for early-stage NSCLC based on prognosis-related mRNAs and methylation sites. Random forest (RF) and support vector machine (SVM) were used to build prediction models for subtypes.
Results
Six prognosis-related subtypes for early-stage NSCLC, including 4 subtypes for lung squamous cell carcinoma (LUSC) and 2 subtypes for lung adenocarcinoma (LUAD), were identified. There were highly expressed and hypermethylated gene regions for LUSC-C1 and LUAD-C2, highly expressed region for LUAD-C1, and hypermethylated regions for LUSC-C3 and LUSC-C4. Molecular subtypes for LUSC were mainly determined by DNA methylation (14 mRNAs and 362 methylation sites). Molecular subtypes for LUAD were determined by both mRNA and DNA methylation information (143 mRNAs and 458 methylation sites). Ten methylation sites were selected as biomarkers for prediction of LUSC-C1 and LUSC-C3, respectively. Nine genes and 1 methylation site were selected as biomarkers for LUAD subtype prediction. These subtypes can be predicted by the selected biomarkers with RF and SVM models.
Conclusions
In conclusion, we proposed a prognosis-related molecular subtype for early-stage NSCLC, which can provide important information for personalized therapy of patients.