Background:
Population aging is emerging as an increasingly acute challenge for countries around the world. One particular manifestation of this phenomenon is the impact of osteoporosis on individuals and national health systems. Previous studies of risk factors for osteoporosis were conducted using traditional statistical methods, but more recent efforts have turned to machine learning approaches. Most such efforts, however, treat the target variable (bone mineral density or fracture rate) as a categorical one which provides no quantitative information. The present study uses five different machine learning methods to analyze the risk factors for T-score of bone mineral density, seeking to: (1) Compare the prediction accuracy between different machine learning methods and traditional multiple linear regression and (2) rank the importance of 25 different risk factors.
Methods:
The study sample includes 24,412 women aged > 55 y/o with 25 related variables, applying traditional multiple linear regression (MLR) and five different machine learning methods: classification and regression tree (CART), Naïve Bayes (NB), Random Forest (RF), stochastic gradient boosting (SGB), and eXtreme Gradient Boosting (XGBoost). The metrics used for model performance comparisons are the symmetric mean absolute percentage error, and relative absolute error, root relative squared error and root mean squared error.
Results:
Machine learning approaches outperformed MLR for all four prediction errors. The average importance ranking of each factor generated by the machine learning methods indicate that age is the most important factor determining T-score, followed by eGFR, BMI, UA, and education level.
Conclusion:
In a group of women aged > 55 years, we demonstrated that machine learning methods provide superior performance in estimating T-Score, with age being the most important impact factor, followed by eGFR, BMI, UA, and education level.