The National Institutes of Health (NIH) Oral chronic Graft-versus-Host Disease (cGVHD) Activity Assessment Instrument is intended to be simple to use and to provide a reproducible objective measure of disease activity over time. The objective of this study was to assess inter- and intraobserver variability in the component and composite scores in patients evaluated with oral cGVHD. Twenty-four clinicians (bone marrow transplant [BMT] oncologists: BMTE, n = 16; BMT midlevel providers: BMT MLP; n = 4; and oral medicine experts [OME], n = 4), from 6 major transplant centers scored high-quality intraoral photographs of 12 patients. The same photographs were evaluated 1 week later by the same evaluators. An intraclass correlation coefficient (ICC) was used to calculate intrarater reliability and interrater agreement was analyzed using a weighted kappa statistic: 0 or=0.90) and highest for ulcers (0.97, 0.85, 0.94). Although 75% of OME were comfortable with their abilities to score the cases, approximately 50% of BMTE and BMT MLP were uncomfortable. The majority felt that their evaluations were accurate; however, 84% agreed that formal training is required. Interrater variability of the oral cGVHD instrument is unacceptable for the purposes of clinical trials. Greater concordance among OME, high intrarater reliability, and participant feedback suggests that formal training may significantly decrease variability. Parallel investigations must be completed using the other organ specific instruments prior to any revision and widespread prospective utilization of these tools as research endpoints.