Study Design Retrospective magnetic resonance imaging grading with comparison between experts and deep convolutional neural networks (CNNs). Objective The application of deep learning to clinical diagnosis has gained popularity. This approach can accelerate image interpretation and serve as a screening tool to help doctors. Methods A comparison was conducted between retrospective magnetic resonance imaging (MRI) grading performed by experts and grading obtained using CNN classifiers. Data were collected from the lumbar axial dataset in the DICOM format. Two experts labeled the sampled images using the same diagnostic tools: localization of patches near the spinal canal, rootlet leveling, and stenosis grading. Comprehensive comparisons were presented for both rootlet cord classification and stenosis grading. Results Rootlet-cord classification for the two analyzers was 90.3% and the F1 score was 86.6%. The agreement of Analyzers-Classifiers was 92.7% and 96.8% for data with 90.6% and 95.6% F1 scores, respectively. For stenosis grading, there was an agreement of 89.2% between the two analyzers, resulting in an F1 score of 76.5%. The grades of the Analyzers-Classifiers agreed on 91.5/89.4% of the data, with an F1 score of 78.4/75.7%. Analyzer1 and Analyzer2 classified >74% as grade A (78.8% and 74.4%, respectively), 15.4% and 18.6% as grade B, 4.2% and 6.0% as grade C, and 1.6% and 2.0% as grade D, respectively. Conclusions The fully automated deep learning model showed competitive results in stenosis grade diagnosis and rootlet cord classification under similar anatomical conditions. However, abrupt anatomical changes can lead to a puzzle diagnosis based only on images.