Purpose. The study explored the clinical influence, effectiveness, limitations, and human comparison outcomes of machine learning in diagnosing (1) dental diseases, (2) periodontal diseases, (3) trauma and neuralgias, (4) cysts and tumors, (5) glandular disorders, and (6) bone and temporomandibular joint as possible causes of dental and orofacial pain. Method. Scopus, PubMed, and Web of Science (all databases) were searched by 2 reviewers until 29th October 2020. Articles were screened and narratively synthesized according to PRISMA-DTA guidelines based on predefined eligibility criteria. Articles that made direct reference test comparisons to human clinicians were evaluated using the MI-CLAIM checklist. The risk of bias was assessed by JBI-DTA critical appraisal, and certainty of the evidence was evaluated using the GRADE approach. Information regarding the quantification method of dental pain and disease, the conditional characteristics of both training and test data cohort in the machine learning, diagnostic outcomes, and diagnostic test comparisons with clinicians, where applicable, were extracted. Results. 34 eligible articles were found for data synthesis, of which 8 articles made direct reference comparisons to human clinicians. 7 papers scored over 13 (out of the evaluated 15 points) in the MI-CLAIM approach with all papers scoring 5+ (out of 7) in JBI-DTA appraisals. GRADE approach revealed serious risks of bias and inconsistencies with most studies containing more positive cases than their true prevalence in order to facilitate machine learning. Patient-perceived symptoms and clinical history were generally found to be less reliable than radiographs or histology for training accurate machine learning models. A low agreement level between clinicians training the models was suggested to have a negative impact on the prediction accuracy. Reference comparisons found nonspecialized clinicians with less than 3 years of experience to be disadvantaged against trained models. Conclusion. Machine learning in dental and orofacial healthcare has shown respectable results in diagnosing diseases with symptomatic pain and with improved future iterations and can be used as a diagnostic aid in the clinics. The current review did not internally analyze the machine learning models and their respective algorithms, nor consider the confounding variables and factors responsible for shaping the orofacial disorders responsible for eliciting pain.