Clinicians use descriptive classification systems when treating patients with low back pain as an adjunct to surgical decision making. Magnetic resonance imaging (MRI) changes, including Modic changes, the presence of a high-intensity zone, and internal disk desiccation, are commonly used descriptors. The question remains whether different clinicians interpret these terms similarly. This study evaluated the inter- and intraobserver reliability of commonly used MRI classifications in patients presenting with low back pain.Sixty-six patients who underwent lumbar spine fusion surgery at a single multiphysician spine specialty practice for degenerative disk disease were identified. For each surgical level, the following MRI variables were determined independently by 3 fellowship-trained spine surgeons: presence or absence of high-intensity zone and/or internal disk desiccation, presence and classification of disk herniation, Modic grade, and disk height. Each surgeon reviewed the same set of MRI studies a second time at least 2 weeks from the first reading. Inter- and intraobserver reliability was determined using multiobserver Kappa coefficients. Intraobserver reliability ranged from 0.563 to 0.988, with greatest agreement in determining disk height. The greatest interobserver agreement was for determining Modic changes (0.819).Controversy remains on the criteria for diagnosing degenerative disk disease. In patients presenting with low back pain diagnosed with degenerative disk disease, the inter- and intraobserver reliability with use of several common MRI diagnostic tools was substantial. These data imply that clinicians interpret these findings in a reproducible fashion and interpret these terms similarly.