Background: Artificial intelligence is gaining traction in automated medical imaging analysis. Development of more accurate magnetic resonance imaging (MRI) predictors of successful clinical outcomes is necessary to better define indications for surgery, improve clinical outcomes with targeted minimally invasive and endoscopic procedures, and realize cost savings by avoiding more invasive spine care.Objective: To demonstrate the ability for deep learning neural network models to identify features in MRI DICOM datasets that represent varying intensities or severities of common spinal pathologies and injuries and to demonstrate the feasibility of generating automated verbal MRI reports comparable to those produced by reading radiologists.Methods: A 3-dimensional (3D) anatomical model of the lumbar spine was fitted to each of the patient's MRIs by a team of technicians. MRI T1, T2, sagittal, axial, and transverse reconstruction image series were used to train segmentation models by the intersection of the 3D model through these image sequences. Class definitions were extracted from the radiologist report for the central canal: (0) no disc bulge/protrusion/canal stenosis, (1) disc bulge without canal stenosis, (2) disc bulge resulting in canal stenosis, and (3) disc herniation/protrusion/extrusion resulting in canal stenosis. Both the left and right neural foramina were assessed with either (0) neural foraminal stenosis absent, or (1) neural foramina stenosis present. Reporting criteria for the pathologies at each disc level and, when available, the grading of severity were extracted, and a natural language processing model was used to generate a verbal and written report. These data were then used to train a set of very deep convolutional neural network models, optimizing for minimal binary cross-entropy for each classification.Results: The initial prediction validation of the implemented deep learning algorithm was done on 20% of the dataset, which was not used for artificial intelligence training. Of the 17,800 total disc locations for which MRI images and radiology reports were available, 14,720 were used to train the model, and 3560 were used to validate against. The convergence of validation accuracy achieved with the deep learning algorithm for the foraminal stenosis detector was 81% (sensitivity ¼ 72.4.4%, specificity ¼ 83.1%) after 25 complete iterations through the entire training dataset (epoch).The accuracy was 86.2% (sensitivity ¼ 91.1%, specificity ¼ 82.5%) for the central stenosis detector and 85.2% (sensitivity ¼ 81.8%, specificity ¼ 87.4%) for the disc herniation detector.Conclusions: Deep learning algorithms may be used for routine reporting in spine MRI. There was a minimal disparity among accuracy, sensitivity, and specificity, indicating that the data were not overfitted to the training set. We concluded that variability in the training data tends to reduce overfitting and overtraining as the deep neural network models learn to focus on the common pathologies. Future studies should demonstrate th...