Background
Alternative splicing (AS) plays important roles in transcriptome and proteome diversity. Its dysregulation has a close affiliation with oncogenic processes. This study aimed to evaluate AS-based biomarkers by machine learning algorithms for lung squamous cell carcinoma (LUSC) patients.
Method
The Cancer Genome Atlas (TCGA) database and TCGA SpliceSeq database were utilized. After data composition balancing, Boruta feature selection and Spearman correlation analysis were used for differentially expressed AS events. Random forests and a nested fivefold cross-validation were applied for lymph node metastasis (LNM) classifier building. Random survival forest combined with Cox regression model was performed for a prognostic model, based on which a nomogram was developed. Functional enrichment analysis and Spearman correlation analysis were also conducted to explore underlying mechanisms. The expression of some switch-involved AS events along with parent genes was verified by qRT-PCR with 20 pairs of normal and LUSC tissues.
Results
We found 16 pairs of splicing events from same parent genes which were strongly related to the splicing switch (intrapair correlation coefficient = − 1). Next, we built a reliable LNM classifier based on 13 AS events as well as a nice prognostic model, in which switched AS events behaved prominently. The qRT-PCR presented consistent results with previous bioinformatics analysis, and some AS events like ITIH5-10715-AT and QKI-78404-AT showed remarkable detection efficiency for LUSC.
Conclusion
AS events, especially switched ones from the same parent genes, could provide new insights into the molecular diagnosis and therapeutic drug design of LUSC.