Short-term residential load forecasting plays a crucial role in smart grids, ensuring an optimal match between energy demands and generation. With the inherent volatility of residential load patterns, deep learning has gained attention due to its ability to capture complex nonlinear relationships within hidden layers. However, most existing studies have relied on default loss functions such as mean squared error (MSE) or mean absolute error (MAE) for neural networks. These loss functions, while effective in overall prediction accuracy, lack specialized focus on accurately predicting load peaks. This article presents a comparative analysis of soft-DTW loss function, a smoothed formulation of Dynamic Time Wrapping (DTW), compared to other commonly used loss functions, in order to assess its effectiveness in improving peak prediction accuracy. To evaluate peak performance, we introduce a novel evaluation methodology using confusion matrix and propose new errors for peak position and peak load, tailored specifically for assessing peak performance in short-term load forecasting. Our results demonstrate the superiority of soft-DTW in capturing and predicting load peaks, surpassing other commonly used loss functions. Furthermore, the combination of soft-DTW with other loss functions, such as soft-DTW + MSE, soft-DTW + MAE, and soft-DTW + TDI (Time Distortion Index), also enhances peak prediction. However, the differences between these combined soft-DTW loss functions are not substantial. These findings highlight the significance of utilizing specialized loss functions, like soft-DTW, to improve peak prediction accuracy in short-term load forecasting.