Background This study aims to propose a breast cancer prediction model for early diagnosis and prognosis management of breast cancer. Objective In order to explore the pathogenesis of breast cancer and develop accurate breast cancer screening and treatment methods, we have used machine‐learning technologies to conduct an in‐depth study of breast cancer genetic data to obtain new breast cancer signature and prognostic prediction models. Methods We explored an optimal cluster by unsupervised clustering methods with different expression genes (DEGs) between normal (n = 113) and tumour (n = 1,102) samples. Using least absolute shrinkage and selection operator (LASSO) regression, we selected four biomarkers to develop a predictive model by Cox regression method in the training set (n = 1,083) and validated its predictive accuracy and independence in the testing sets (n = 2,480). Then Gene Set Enrichment Analysis (GSEA) revealed enriched biological pathways in clusters. Finally, we constructed a nomogram including this signature and other significant risk factors to predict survival rates in patients. Results Four mRNAs (CD163L1, QPRT, NKAIN1 and TP53AIP1) between two clusters from 4,938 DEGs were identified, and then a four‐gene model (risk scores = 0.454*CD163L1–0.360*NKAIN1 + 0.581*QPRT + 0.788*TP53AIP1) was established to divide patients into high‐ and low‐risk group with significantly different prognosis (p < 0.0001) in the training set. Integrated analysis revealed dysregulated molecular processes including predominantly oncogenic signalling pathway, cell cycle and DNA repair in high‐risk group but enriched metabolism pathway in low‐risk group. In addition, this model had similar predictive value (HR >1.60; p < 0.05) in three independent validation sets, which could predict survival independently with more power compared with single clinical factor. In addition, the nomogram could predict the prognosis of breast cancer patients precisely in the training set and another three testing sets. Conclusion This model could predict prognosis of breast cancer patients precisely and independently, and provide evidence to make treatment decisions and design clinical trials.
Background. Currently, predictive models were not developed based on the signaling pathway signatures of immune-related lncRNAs in breast cancer (BRCA) patients. Methods. We selected unsupervised hierarchical clustering algorithm to classify patients with BRCA based on the significant immune-derived lncRNAs from the TCGA dataset. And different methods including ESTIMATE, ImmuneCellAI, and CIBERSORT were performed to evaluate the immune infiltration of tumor microenvironment. Using Lasso regression algorithm, we filtered the significant signaling pathways enriched by GSEA, GSVA, or PPI analysis to develop a prognostic model. And a nomogram integrated with clinical factors and significant pathways was constructed to predict the precise probability of overall survival (OS) of BRCA patients in the TCGA dataset (n = 1,098) and another two testing sets (n = 415). Results. BRCA patients were stratified into the PC (n = 571) and GC (n = 527) subgroup with significantly different prognosis with 550 immune-related lncRNAs in the TCGA dataset. Integrated analysis revealed different immune response, oncogenic signaling, and metabolic reprograming pathways between these two subgroups. And a 5-pathway signature could predict the prognosis of BRCA patients between these two subgroups independently in the TCGA dataset, which was confirmed in another two cohorts from the GEO dataset. In the TCGA dataset, 5-year OS rate was 78% (95% CI: 73–84) vs. 82% (95% CI: 77–87) for the PC and GC group (HR = 1.63 (95% CI: 1.17–2.28), p = 0.004 ). The predictive power was similar in another two testing sets (HR > 1.20, p < 0.01 ). Finally, a nomogram is developed for clinical application, which integrated this signature and age to accurately predict the survival probability in BRCA patients. Conclusion. This 5-pathway signature correlated with immune-derived lncRNAs was able to precisely predict the prognosis for patients with BRCA and provided a rich source characterizing immune-related lncRNAs and further informed strategies to target BRCA vulnerabilities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.