BackgroundIn Kampo medicine, tongue examination is used to diagnose the pathological condition “Sho,” but an objective evaluation method for its diagnostic ability has not been established. We constructed a tongue diagnosis electronic learning and evaluation system based on a standardized tongue image database.PurposeThis study aims to verify the practicality of this assessment system by evaluating the tongue diagnosis ability of Kampo specialists (KSs), medical professionals, and students.MethodsIn the first study, we analyzed the answer data of 15 KSs in an 80-question tongue diagnosis test that assesses eight aspects of tongue findings and evaluated the (i) test score, (ii) test difficulty and discrimination index, (iii) diagnostic consistency, and (iv) diagnostic match rate between KSs. In the second study, we administered a 20-question common Kampo test and analyzed the answer data of 107 medical professionals and 56 students that assessed the tongue color discrimination ability and evaluated the (v) correct answer rate, (vi) test difficulty, and (vii) factors related to the correct answer rate.ResultIn the first study, the average test score was 62.2 ± 10.7 points. Twenty-eight questions were difficult (correct answer rate, <50%), 34 were moderate (50%–85%), and 18 were easy (≥85%). Regarding intrarater reliability, the average diagnostic match rate of five KSs involved in database construction was 0.66 ± 0.08, and as for interrater reliability, the diagnostic match rate between the 15 KSs was 0.52 (95% confidence interval, 0.38–0.65) for Gwet's agreement coefficient 1, and the degree of the match rate was moderate. In the second study, the difficulty level of questions was moderate, with a correct rate of 81.3% for medical professionals and 82.1% for students. The discrimination index was good for medical professionals (0.35) and poor for students (0.06). Among medical professionals, the correct answer group of this question had a significantly higher total score on the Kampo common test than the incorrect answer group (85.3 ± 8.4 points vs. 75.8 ± 11.8 points, p < 0.01).ConclusionThis system can objectively evaluate tongue diagnosis ability and has high practicality. Utilizing this system can be expected to contribute to improving learners’ tongue diagnosis ability and standardization of tongue diagnosis.