Breast cancer is the most common cancer in women. A better understanding of risk factors plays a central role in disease prediction and prevention. We aimed to identify potential novel risk factors for breast cancer among post-menopausal women, with pre-specified interest in the role of polygenic risk scores (PRS) for risk prediction. We designed an analysis pipeline combining both machine learning (ML) and classical statistical models with emphasis on necessary statistical considerations (e.g. collinearity, missing data). Extreme gradient boosting (XGBoost) machine with Shapley (SHAP) feature importance measures were used for risk factor discovery among ~1.7k features in 104,313 post-menopausal women from the UK Biobank cohort. Cox models were constructed subsequently for in-depth investigation. Both PRS were significant risk factors when fitted simultaneously in both ML and Cox models (p<0.001). ML analyses identified 11 (excluding the two PRS) novel predictors, among which five were confirmed by the Cox models: plasma urea (HR=0.95, 95% CI 0.92-0.98, p<0.001) and plasma phosphate (HR=0.67, 95% CI 0.52-0.88, p=0.003) were inversely associated with risk of developing post-menopausal breast cancer, whereas basal metabolic rate (HR=1.15, 95% CI 1.08-1.22, p<0.001), red blood cell count (HR=1.20, 95% CI 1.08-1.34, p=0.001), and creatinine in urine (HR=1.05, 95% CI 1.01-1.09, p=0.008) were positively associated. Our final Cox model demonstrated a slight improvement in risk discrimination when adding novel features to a simpler Cox model containing PRS and the established risk factors (Harrell's C-index = 0.670 vs 0.665).
ObjectiveTo externally evaluate the performance of QRISK3 for predicting 10 year risk of cardiovascular disease (CVD) in the UK Biobank cohort.MethodsWe used data from the UK Biobank, a large-scale prospective cohort study of 403 370 participants aged 40–69 years recruited between 2006 and 2010 in the UK. We included participants with no previous history of CVD or statin treatment and defined the outcome to be the first occurrence of coronary heart disease, ischaemic stroke or transient ischaemic attack, derived from linked hospital inpatient records and death registrations.ResultsOur study population included 233 233 women and 170 137 men, with 9295 and 13 028 incident CVD events, respectively. Overall, QRISK3 had moderate discrimination for UK Biobank participants (Harrell’s C-statistic 0.722 in women and 0.697 in men) and discrimination declined by age (<0.62 in all participants aged 65 years or older). QRISK3 systematically overpredicted CVD risk in UK Biobank, particularly in older participants, by as much as 20%.ConclusionsQRISK3 had moderate overall discrimination in UK Biobank, which was best in younger participants. The observed CVD risk for UK Biobank participants was lower than that predicted by QRISK3, particularly for older participants. It may be necessary to recalibrate QRISK3 or use an alternate model in studies that require accurate CVD risk prediction in UK Biobank.
Objective To investigate whether genetic risk of type 2 diabetes modifies associations between body mass index (BMI) and first degree family history of diabetes with 1) prevalent pre-diabetes or undiagnosed diabetes; and 2) incident confirmed type 2 diabetes. Methods We included 431,658 40-69 year olds at baseline of multi-ethnic ancestry from the UK Biobank. We used a multi-ethnic polygenic risk score for type 2 diabetes (PRST2D) developed by Genomics PLC. Pre-diabetes or undiagnosed diabetes was defined as baseline HbA1c ≥ 42 mmol/mol (6.0%) and incident type 2 diabetes was derived from medical records. Results At baseline, 43,472 participants had pre-diabetes or undiagnosed diabetes, and 17,259 developed type 2 diabetes over 15 years follow-up. Dose-response associations were observed for PRST2D with each outcome in each category of BMI or first degree family history of diabetes. Those in the highest quintile of PRST2D with a normal BMI, were at the similar risk as those in the middle quintile who were overweight. Participants who were in the highest quintile of PRST2D and did not have a first degree family history of diabetes were at the similar risk than those with a family history who were in the middle category of PRST2D. Conclusions Genetic risk of type 2 diabetes remains strongly associated with risk of pre-diabetes, undiagnosed diabetes and future type 2 diabetes within categories of non-genetic risk factors. This could have important implications for identifying individuals at risk of type 2 diabetes for prevention and early diagnosis programmes.
We aimed to identify potential novel predictors for breast cancer among post-menopausal women, with pre-specified interest in the role of polygenic risk scores (PRS) for risk prediction. We utilised an analysis pipeline where machine learning was used for feature selection, prior to risk prediction by classical statistical models. An “extreme gradient boosting” (XGBoost) machine with Shapley feature-importance measures were used for feature selection among $$\approx$$ ≈ 1.7 k features in 104,313 post-menopausal women from the UK Biobank. We constructed and compared the “augmented” Cox model (incorporating the two PRS, known and novel predictors) with a “baseline” Cox model (incorporating the two PRS and known predictors) for risk prediction. Both of the two PRS were significant in the augmented Cox model ($$p<0.001$$ p < 0.001 ). XGBoost identified 10 novel features, among which five showed significant associations with post-menopausal breast cancer: plasma urea (HR = 0.95, 95% CI 0.92–0.98, $$p<0.001$$ p < 0.001 ), plasma phosphate (HR = 0.68, 95% CI 0.53–0.88, $$p=0.003$$ p = 0.003 ), basal metabolic rate (HR = 1.17, 95% CI 1.11–1.24, $$p<0.001$$ p < 0.001 ), red blood cell count (HR = 1.21, 95% CI 1.08–1.35, $$p<0.001$$ p < 0.001 ), and creatinine in urine (HR = 1.05, 95% CI 1.01–1.09, $$p=0.006$$ p = 0.006 ). Risk discrimination was maintained in the augmented Cox model, yielding C-index 0.673 vs 0.667 (baseline Cox model) with the training data and 0.665 vs 0.664 with the test data. We identified blood/urine biomarkers as potential novel predictors for post-menopausal breast cancer. Our findings provide new insights to breast cancer risk. Future research should validate novel predictors, investigate using multiple PRS and more precise anthropometry measures for better breast cancer risk prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.