Total hip arthroplasty (THA) and total knee arthroplasty (TKA) are among the most common surgeries in the healthcare system which highly consume hospital resources. An accurate prediction of the duration of surgery (DOS) can improve operating room scheduling and subsequently enhance hospital resource allocation efficiency. Currently, hospitals generally rely on historical data average or surgeons’ experience for DOS prediction which are prone to inaccuracy and personal bias. Besides, not enough attention has been paid to DOS prediction for these surgeries in the literature. This paper aims to develop machine learning (ML) models to predict surgery duration for patients undergoing hip and knee arthroplasty based on clinical and operational factors. Clinical and operational factors (n = 3,233) were extracted from Aalborg University Hospital’s database from 2017 to 2020. Three ML models (Extreme Gradient Boosting (XGBoost), Multilayer Perceptron, and Support Vector Machine) were developed and their performances were evaluated and compared with a baseline model. XGBoost demonstrated the best performance among all models (Mean Absolute Error = 12.86, Root Mean Squared Error = 16.67, Buffer Accuracy = 68.73). Furthermore, all models performed better than the baseline model. Analyzing feature importances indicated that the surgeon, temporal factors, and surgery type are the most contributing factors for predicting DOS. To conclude, machine learning models can improve the accuracy of DOS prediction for TKA and THA surgeries compared to the current methods. As an ensemble learning method, XGBoost can better deal with health data complexities related to DOS. Besides medical features, operational factors have a significant contribution to predicting DOS.