In this article, a new deep learning-based approach for online estimation of damage size and remaining useful life of structures is presented. The proposed approach consists of three modules. In the first module, a long short-term memory regression model is used to construct a sensor-based estimation of the damage size where different ranges of temporal correlations are considered for their effects on the accuracy of the damage size estimations. In the second module, a convolutional neural network semantic image segmentation approach is used to construct automated damage size estimations in which a pixel-wise classification is carried out on images of the damaged areas. Using physics-of-failure relations, frequency mismatches associated with sensor- and image-based size estimations are resolved. Finally, in the third module, damage size estimations obtained by the first two modules are fused together for an online remaining useful life estimation of the structure. Performance of the proposed approach is evaluated using sensor and image data obtained from a set of fatigue crack experiments performed on aluminum alloy 7075-T6 specimens. It is shown that using acoustic emission signals obtained from sensors and microscopic images in these experiments, the damage size estimations obtained from the proposed data fusion approach have higher accuracy than the sensor-based and higher frequency than the image-based estimations. Moreover, the accuracy of the data fusion estimations is found to be more than that of image-based estimations for the experiment with the largest sensor dataset. Based on the results obtained, it is concluded that the consideration of longer temporal correlations can lead to improvements in the accuracy of crack size estimations and, thus, a better remaining useful life estimation for structures.