The fire heat release rate (HRR) is a crucial parameter for describing the combustion process and its thermal effects. In recent years, some studies have employed fire scene images and deep learning algorithms to predict real-time fire HRR, which has led to the advancement of HRR prediction in terms of both lightweightness and real-time monitoring. Nevertheless, the development of an early-stage monitoring system for fires and the ability to predict future HRR based on current moment data represents a crucial foundation for evaluating the scale of indoor fires and enhancing the capacity to prevent and control such incidents. This paper proposes a deep learning model based on continuous fire scene images (containing both flame and smoke features) and their time-series information to predict the future transient fire HRR. The model (Att-BiLSTM) comprises three bi-directional long- and short-term memory (Bi-LSTM) layers and one attention layer. The model employs a bidirectional feature extraction approach, followed by the introduction of an attention mechanism to highlight the image features that have a critical impact on the prediction results. In this paper, a large-scale dataset is constructed by collecting 27,231 fire scene images with instantaneous HRR annotations from 40 different fire trials from the NIST database. The experimental results demonstrate that Att-BiLSTM is capable of effectively utilizing fire scene image features and temporal information to accurately predict future transient HRR, including those in high-brightness fire environments and complex fire source situations. The research presented in this paper offers novel insights and methodologies for fire monitoring and emergency response.