Robots execute diverse load operations, including carrying, lifting, tilting, and moving objects, involving load changes or transfers. This dynamic process can result in the shift of interactive operations from stability to instability. In this paper, we respond to these dynamic changes by utilizing tactile images captured from tactile sensors during interactions, conducting a study on the dynamic stability and instability in operations, and propose a real-time dynamic state sensing network by integrating convolutional neural networks (CNNs) for spatial feature extraction and long short-term memory (LSTM) networks to capture temporal information. We collect a dataset capturing the entire transition from stable to unstable states during interaction. Employing a sliding window, we sample consecutive frames from the collected dataset and feed them into the network for the state change predictions of robots. The network achieves both real-time temporal sequence prediction at 31.84 ms per inference step and an average classification accuracy of 98.90%. Our experiments demonstrate the network’s robustness, maintaining high accuracy even with previously unseen objects.