ChatGPT has demonstrated significant potential in various aspects of medicine, including its performance on licensing examinations. In this study, we systematically investigated ChatGPT’s performance in Iranian medical exams and assessed the quality of the included studies using a previously published assessment checklist. The study found that ChatGPT achieved an accuracy range of 32–72% on basic science exams, 34–68.5% on pre-internship exams, and 32–84% on residency exams. Notably, its performance was generally higher when the input was provided in English compared to Persian. One study reported a 40% accuracy rate on an endodontic board exam. To establish ChatGPT as a supplementary tool in medical education and clinical practice, we suggest that dedicated guidelines and checklists are needed to ensure high-quality and consistent research in this emerging field.