Background: Non-alcoholic fatty liver (NAFL) can progress to the severe subtype non-alcoholic steatohepatitis (NASH) and/or fibrosis, which are associated with increased morbidity, mortality, and healthcare costs. Current machine learning studies detect NASH; however, this study is unique in predicting the progression of NAFL patients to NASH or fibrosis. Aim: To utilize clinical information from NAFL-diagnosed patients to predict the likelihood of progression to NASH or fibrosis. Methods: Data were collected from electronic health records of patients receiving a first-time NAFL diagnosis. A gradient boosted machine learning algorithm (XGBoost) as well as logistic regression (LR) and multi-layer perceptron (MLP) models were developed. A five-fold cross-validation grid search was utilized for hyperparameter optimization of variables, including maximum tree depth, learning rate, and number of estimators. Predictions of patients likely to progress to NASH or fibrosis within 4 years of initial NAFL diagnosis were made using demographic features, vital signs, and laboratory measurements. Results: The XGBoost algorithm achieved area under the receiver operating characteristic (AUROC) values of 0.79 for prediction of progression to NASH and 0.87 for fibrosis on both hold-out and external validation test sets. The XGBoost algorithm outperformed the LR and MLP models for both NASH and fibrosis prediction on all metrics. Conclusion: It is possible to accurately identify newly diagnosed NAFL patients at high risk of progression to NASH or fibrosis. Early identification of these patients may allow for increased clinical monitoring, more aggressive preventative measures to slow the progression of NAFL and fibrosis, and efficient clinical trial enrollment.