On-line fatigue crack evaluation is crucial for ensuring the structural safety and reducing the maintenance costs of safety-critical systems. Among structural health monitoring (SHM), guided wave (GW)-based SHM has been deemed as one of the most promising techniques. However, the traditional damage index-based method and machine learning methods require manual processing and selection of GW features, which depend highly on expert knowledge and are easily affected by complicated uncertainties. Therefore, this paper proposes a fatigue crack evaluation framework with the GW–convolutional neural network (CNN) ensemble and differential wavelet spectrogram. The differential time–frequency spectrogram between the baseline signal and the monitoring signal is processed as the CNN input with the complex Gaussian wavelet transform. Then, an ensemble of CNNs is trained to jointly determine the crack length. Real fatigue tests on complex lap joint structures were carried out to validate the proposed method, in which several structures were tested preliminarily for collecting the training dataset and a new structure was adopted for testing. The root mean square error of the training dataset is 1.4 mm. Besides, the root mean square error of the evaluated crack length in the testing lap joint structure was 1.7 mm, showing the effectiveness of the proposed method.