IntroductionPredicting medical science students’ performance on high-stakes examinations has received considerable attention. Machine learning (ML) models are well-known approaches to enhance the accuracy of determining the students’ performance. Accordingly, we aim to provide a comprehensive framework and systematic review protocol for applying ML in predicting medical science students’ performance on high-stakes examinations. Improving the current understanding of the input and output features, preprocessing methods, setting of ML models and required evaluation metrics seems essential.Methods and analysisA systematic review will be conducted by searching the electronic bibliographic databases of MEDLINE/PubMed, EMBASE, SCOPUS and Web of Science. The search will be limited to studies published from January 2013 to June 2023. Studies explicitly predicting student performance in high-stakes examinations and referencing their learning outcomes and use of ML models will be included. Two team members will first screen literature meeting the inclusion criteria at the title, abstract and full-text levels. Second, the Best Evidence Medical Education quality framework rates the included literature. Later, two team members will extract data, including the studies’ general data and the ML approach’s details. Finally, the information consensus will be reached and submitted for analysis. The synthesised evidence from this review provides helpful information for medical education policy-makers, stakeholders and other researchers in adopting the ML models to evaluate medical science students’ performance in high-stakes exams.Ethics and disseminationThis systematic review protocol summarises findings of existing publications rather than primary data and does not require an ethics review. The results will be disseminated in publications of peer-reviewed journals.