We investigated the improvement achieved in the performance of a deep-learning inference processor by changing its cache memory from SRAM to spin-orbit torque magnetoresistive random-access memory (SOT-MRAM). The implementation of SOT-MRAM doubled the capacity in the same area compared to SRAM. It is also expected to reduce the main memory transfer without changing the chip area, thereby reducing the energy. As a case study, we simulated how much the performance could be improved by replacing SRAM with MRAM in a deep learning processor. The NVIDIA deep-learning accelerator (NVDLA) was used as a motif processor, and SegNet and U-Net were used as the target networks for the segmentation task. The image size was set to 512 × 1024 pixels. We evaluated the performance of the NVDLA with a 512-KB buffer and cache memory sizes of 1, 2, 4, and 8 MB for its on-chip memory, replacing these two memories with MRAM implementations. As a result, when both the buffer and cache were replaced with SOT-MRAM, the energy consumption and speed could be reduced by 18.6% and 17.9%, respectively. In addition, the performance per unit area was improved by more than 36.4%. Replacing SRAM with spin-transfer torque MRAM is not suitable for inference devices, because the latency is significantly worse as a result of its slow write operation.