Structural response estimation based on deep learning can suffer from reduced estimation performance owing to discrepancies between the training and test data as the noise level in the test data increases. This study proposes a short‐time Fourier transform‐based long short‐term memory (STFT‐LSTM) model to improve estimation performance in the presence of noise and ensure estimation robustness. This model enables robust estimations in the presence of noise by positioning an STFT layer before feeding the data into the LSTM layer. The output transformed into the time‐frequency domain by the STFT layer is learned by the LSTM model. The robustness of the proposed model was validated using a numerical model with three degrees of freedom at various signal‐to‐noise ratio levels, and its robustness against impulse and periodic noise was verified. Experimental validation assessed the estimation robustness under impact load and verified the robustness against environmental noise in the acquired acceleration response.