We sought to review our experience with salivary mucoepidermoid carcinoma (MEC) over two decades to confirm the validity and reproducibility of histologic grading and to investigate MIB-1 index as a prognosticator. Diagnosis was confirmed on 80 cases, and chart review or patient contact was achieved for 48 patients, with follow-up from 5 to 240 months (median 36 months). Immunohistochemistry with citrate antigen retrieval for MIB-1 was performed on a subset of cases. Kaplan-Meier survival curves were generated for each stage, site, and grade according to our proposed grading system. To address the issue of grading reproducibility, 20 slides were circulated among five observers, without prior discussion; slides were categorized as low-, intermediate-, or high-grade according to one's "own" criteria, and then according to the AFIP criteria proposed by Goode et al.10 Weighted kappa (kappa) estimates were obtained to describe the extent of agreement between pairs of rating. The Wilcoxon signed rank test or the Friedman test as appropriate tested variation across ratings. There was no gender predominance and a wide age range (15-86 years, median 49 years). The two most common sites were parotid and palate. All grade 1 MECs presented as Stage I tumors, and no failures were seen for this category. The local disease failure rates at 75 months for grades 2 and 3 MEC were 30% and 70%, respectively. Tumor grade, stage, and negative margin status all correlated with disease-free survival (DFS) (p = 0.0091, 0.0002, and 0.048, respectively). The MIB index was not found to be predictive of grade. Regarding the reproducibility of grading, the interobserver variation for pathologists using their "own" grading, as expressed by the kappa value, ranged from good agreement (kappa = 0.79) to poor (kappa = 0.27) (average kappa = 0.49). A somewhat better interobserver reproducibility was achieved when the pathologists utilized the standardized AFIP criteria (average kappa = 0.61, range 0.38-0.77). This greater agreement was also reflected in the Friedman test (statistical testing of intraobserver equality), which indicated significant differences in using one's own grading systems (p = 0.0001) but not in applying the AFIP "standardized" grading (p = 0.33). When one's own grading was compared with the AFIP grading, there were 100 pairs of grading "events," with 46 disagreements/100 pairs. For 98% of disagreements, the AFIP grading "downgraded" tumors. This led us to reanalyze a subset of 31 patients for DFS versus grade, for our grading schema compared with the AFIP grading. Although statistical significance was not achieved for this subset, the log rank value revealed a trend for our grading (p = 0.0993) compared with the Goode schema (p = 0.2493). This clinicopathologic analysis confirms the predictive value of tumor staging and three-tiered histologic grading. Our grading exercise confirms that there is significant grading disparity for MEC, even among experienced ENT/oral pathologists. The improved reproducibility obtained when the w...